I’ve always been a fan of converting model outputs to real-life quantities of interest. For example, I like to supplement a logistic regression model table with predicted probabilities for a given set of explanatory variable levels. This can be more intuitive than odds ratios, particularly for a lay audience.

For example, say I have run a logistic regression model for predicted 5 year survival after colon cancer. What is the actual probability of death for a patient under 40 with a small cancer that has not perforated? How does that probability differ for a patient over 40?

I’ve tried this various ways. I used Zelig for a while including here, but it started trying to do too much and was always broken (I updated it the other day in the hope that things were better, but was met with a string of errors again).

I also used rms, including here (checkout the nice plots!). I like it and respect the package. But I don’t use it as standard and so need to convert all the models first, e.g. to `lrm`

. Again, for my needs it tries to do too much and I find `datadist`

awkward.

Thirdly, I love Stan for this, e.g. used in this paper. The `generated quantities`

block allows great flexibility to simulate whatever you wish from the posterior. I’m a Bayesian at heart will always come back to this. But for some applications it’s a bit much, and takes some time to get running as I want.

I often simply want to `predict`

`y-hat`

from `lm`

and `glm`

with bootstrapped intervals and ideally a comparison of explanatory levels sets. Just like `sim`

does in `Zelig`

. But I want it in a format I can immediately use in a publication.

Well now I can with `finalfit`

.

You need to use the github version of the package until CRAN is updated

devtools::install_github("ewenharrison/finalfit")

There’s two main functions with some new internals to help expand to other models in the future.

`finalfit_newdata`

is used to generate a new dataframe. I usually want to set 4 or 5 combinations of `x`

levels and often find it difficult to get this formatted for `predict`

. Pass the original dataset, the names of explanatory variables used in the model, and a list of levels for these. For the latter, they can be included as rows or columns. If the data type is incorrect or you try to pass factor levels that don’t exist, it will fail with a useful warning.

library(finalfit) explanatory = c("age.factor", "extent.factor", "perfor.factor") dependent = 'mort_5yr' colon_s %>% finalfit_newdata(explanatory = explanatory, newdata = list( c("<40 years", "Submucosa", "No"), c("<40 years", "Submucosa", "Yes"), c("<40 years", "Adjacent structures", "No"), c("<40 years", "Adjacent structures", "Yes") )) -> newdata newdata age.factor extent.factor perfor.factor 1 <40 years Submucosa No 2 <40 years Submucosa Yes 3 <40 years Adjacent structures No 4 <40 years Adjacent structures Yes

`boot_predict`

takes standard `lm`

and `glm`

model objects, together with `finalfit`

`lmlist`

and `glmlist`

objects from fitters, e.g. `lmmulti`

and `glmmulti`

. In addition, it requires a `newdata`

object generated from `finalfit_newdata`

. If you’re new to this, don’t be put off by all those model acronyms, it is straightforward.

colon_s %>% glmmulti(dependent, explanatory) %>% boot_predict(newdata, estimate_name = "Predicted probability of death", R=100, boot_compare = FALSE, digits = c(2,3)) Age Extent of spread Perforation Predicted probability of death 1 <40 years Submucosa No 0.28 (0.00 to 0.52) 2 <40 years Submucosa Yes 0.29 (0.00 to 0.61) 3 <40 years Adjacent structures No 0.71 (0.50 to 0.86) 4 <40 years Adjacent structures Yes 0.72 (0.45 to 0.89)

Note that the number of simulations (R) here is low for demonstration purposes. You should expect to use 1000 to 10000 to ensure you have stable estimates.

Simulations are produced using bootstrapping and everything is tidily outputted in a table/dataframe, which can be passed to `knitr::kable`

.

# Within an .Rmd file ```{r} knitr::kable(table, row.names = FALSE, align = c("l", "l", "l", "r")) ```

Better still, by including `boot_compare==TRUE`

, comparisons are made between the first row of `newdata`

and each subsequent row. These can be first differences (e.g. absolute risk differences) or ratios (e.g. relative risk ratios). The comparisons are done on the individual bootstrap predictions and the distribution summarised as a mean with percentile confidence intervals (95% CI as default, e.g. 2.5 and 97.5 percentiles). A p-value is generated on the proportion of values on the other side of the null from the mean, e.g. for a ratio greater than 1.0, p is the number of bootstrapped predictions under 1.0. Multiplied by two so it is two-sided. (Sorry about including a p-value).

Scroll right here:

colon_s %>% glmmulti(dependent, explanatory) %>% boot_predict(newdata, estimate_name = "Predicted probability of death", compare_name = "Absolute risk difference", R=100, digits = c(2,3)) Age Extent of spread Perforation Predicted probability of death Absolute risk difference 1 <40 years Submucosa No 0.28 (0.00 to 0.52) - 2 <40 years Submucosa Yes 0.29 (0.00 to 0.62) 0.01 (-0.15 to 0.20, p=0.920) 3 <40 years Adjacent structures No 0.71 (0.56 to 0.89) 0.43 (0.19 to 0.68, p<0.001) 4 <40 years Adjacent structures Yes 0.72 (0.45 to 0.91) 0.43 (0.11 to 0.73, p<0.001)

It doesn’t yet include our other common models, such as `coxph`

which I may add in. It doesn’t do `lmer`

or `glmer`

either. `bootMer`

works well mixed-effects models which take a bit more care and thought, e.g. how are random effects to be handled in the simulations. So I don’t have immediate plans to add that in, better to do directly.

Finally, as with all `finalfit`

functions, results can be produced as individual variables using `condense == FALSE`

. This is particularly useful for plotting

library(finalfit) library(ggplot2) theme_set(theme_bw()) explanatory = c("nodes", "extent.factor", "perfor.factor") dependent = 'mort_5yr' colon_s %>% finalfit_newdata(explanatory = explanatory, rowwise = FALSE, newdata = list( rep(seq(0, 30), 4), c(rep("Muscle", 62), rep("Adjacent structures", 62)), c(rep("No", 31), rep("Yes", 31), rep("No", 31), rep("Yes", 31)) ) ) -> newdata colon_s %>% glmmulti(dependent, explanatory) %>% boot_predict(newdata, boot_compare = FALSE, R=100, condense=FALSE) %>% ggplot(aes(x = nodes, y = estimate, ymin = estimate_conf.low, ymax = estimate_conf.high, fill=extent.factor))+ geom_line(aes(colour = extent.factor))+ geom_ribbon(alpha=0.1)+ facet_grid(.~perfor.factor)+ xlab("Number of postive lymph nodes")+ ylab("Probability of death")+ labs(fill = "Extent of tumour", colour = "Extent of tumour")+ ggtitle("Probability of death by lymph node count")

So there you have it. Straightforward bootstrapped simulations of model predictions, together with comparisons and easy plotting.

]]>`finalfit`

results out of RStudio, and particularly into Microsoft Word.
Here is how.

Make sure you are on the most up-to-date version of `finalfit`

.

devtools::install_github("ewenharrison/finalfit")

What follows is for demonstration purposes and is not meant to illustrate model building.

Does a tumour characteristic (differentiation) predict 5-year survival?

First explore variable of interest (exposure) by making it the dependent.

library(finalfit) library(dplyr) dependent = "differ.factor" # Specify explanatory variables of interest explanatory = c("age", "sex.factor", "extent.factor", "obstruct.factor", "nodes")

Note this useful alternative way of specifying explanatory variable lists:

colon_s %>% select(age, sex.factor, extent.factor, obstruct.factor, nodes) %>% names() -> explanatory

Look at associations between our exposure and other explanatory variables. Include missing data.

colon_s %>% summary_factorlist(dependent, explanatory, p=TRUE, na_include=TRUE)

label levels Well Moderate Poor p Age (years) Mean (SD) 60.2 (12.8) 59.9 (11.7) 59 (12.8) 0.788 Sex Female 51 (11.6) 314 (71.7) 73 (16.7) 0.400 Male 42 (9.0) 349 (74.6) 77 (16.5) Extent of spread Submucosa 5 (25.0) 12 (60.0) 3 (15.0) 0.081 Muscle 12 (11.8) 78 (76.5) 12 (11.8) Serosa 76 (10.2) 542 (72.8) 127 (17.0) Adjacent structures 0 (0.0) 31 (79.5) 8 (20.5) Obstruction No 69 (9.7) 531 (74.4) 114 (16.0) 0.110 Yes 19 (11.0) 122 (70.9) 31 (18.0) Missing 5 (25.0) 10 (50.0) 5 (25.0) nodes Mean (SD) 2.7 (2.2) 3.6 (3.4) 4.7 (4.4) <0.001 Warning messages: 1: In chisq.test(tab, correct = FALSE) : Chi-squared approximation may be incorrect 2: In chisq.test(tab, correct = FALSE) : Chi-squared approximation may be incorrect

Note missing data in `obstruct.factor`

. We will drop this variable for now (again, this is for demonstration only). Also see that `nodes`

has not been labelled.

There are small numbers in some variables generating chisq.test warnings (predicted less than 5 in any cell). Generate final table.

Hmisc::label(colon_s$nodes) = "Lymph nodes involved" explanatory = c("age", "sex.factor", "extent.factor", "nodes") colon_s %>% summary_factorlist(dependent, explanatory, p=TRUE, na_include=TRUE, add_dependent_label=TRUE) -> table1 table1

Dependent: Differentiation Well Moderate Poor p Age (years) Mean (SD) 60.2 (12.8) 59.9 (11.7) 59 (12.8) 0.788 Sex Female 51 (11.6) 314 (71.7) 73 (16.7) 0.400 Male 42 (9.0) 349 (74.6) 77 (16.5) Extent of spread Submucosa 5 (25.0) 12 (60.0) 3 (15.0) 0.081 Muscle 12 (11.8) 78 (76.5) 12 (11.8) Serosa 76 (10.2) 542 (72.8) 127 (17.0) Adjacent structures 0 (0.0) 31 (79.5) 8 (20.5) Lymph nodes involved Mean (SD) 2.7 (2.2) 3.6 (3.4) 4.7 (4.4) <0.001

Now examine explanatory variables against outcome. Check plot runs ok.

explanatory = c("age", "sex.factor", "extent.factor", "nodes", "differ.factor") dependent = "mort_5yr" colon_s %>% finalfit(dependent, explanatory, dependent_label_prefix = "") -> table2

Mortality 5 year Alive Died OR (univariable) OR (multivariable) Age (years) Mean (SD) 59.8 (11.4) 59.9 (12.5) 1.00 (0.99-1.01, p=0.986) 1.01 (1.00-1.02, p=0.195) Sex Female 243 (47.6) 194 (48.0) - - Male 268 (52.4) 210 (52.0) 0.98 (0.76-1.27, p=0.889) 0.98 (0.74-1.30, p=0.885) Extent of spread Submucosa 16 (3.1) 4 (1.0) - - Muscle 78 (15.3) 25 (6.2) 1.28 (0.42-4.79, p=0.681) 1.28 (0.37-5.92, p=0.722) Serosa 401 (78.5) 349 (86.4) 3.48 (1.26-12.24, p=0.027) 3.13 (1.01-13.76, p=0.076) Adjacent structures 16 (3.1) 26 (6.4) 6.50 (1.98-25.93, p=0.004) 6.04 (1.58-30.41, p=0.015) Lymph nodes involved Mean (SD) 2.7 (2.4) 4.9 (4.4) 1.24 (1.18-1.30, p<0.001) 1.23 (1.17-1.30, p<0.001) Differentiation Well 52 (10.5) 40 (10.1) - - Moderate 382 (76.9) 269 (68.1) 0.92 (0.59-1.43, p=0.694) 0.70 (0.44-1.12, p=0.132) Poor 63 (12.7) 86 (21.8) 1.77 (1.05-3.01, p=0.032) 1.08 (0.61-1.90, p=0.796)

colon_s %>% or_plot(dependent, explanatory, breaks = c(0.5, 1, 5, 10, 20, 30))

Important. In most R Markdown set-ups, environment objects require to be saved and loaded to R Markdown document.

# Save objects for knitr/markdown save(table1, table2, dependent, explanatory, file = "out.rda")

We use RStudio Server Pro set-up on Ubuntu. But these instructions should work fine for most/all RStudio/Markdown default set-ups.

In RStudio, select `File > New File > R Markdown`

.

A useful template file is produced by default. Try hitting `knit to Word`

on the `knitr`

button at the top of the `.Rmd`

script window.

Now paste this into the file:

--- title: "Example knitr/R Markdown document" author: "Ewen Harrison" date: "22/5/2018" output: word_document: default --- ```{r setup, include=FALSE} # Load data into global environment. library(finalfit) library(dplyr) library(knitr) load("out.rda") ``` ## Table 1 - Demographics ```{r table1, echo = FALSE, results='asis'} kable(table1, row.names=FALSE, align=c("l", "l", "r", "r", "r", "r")) ``` ## Table 2 - Association between tumour factors and 5 year mortality ```{r table2, echo = FALSE, results='asis'} kable(table2, row.names=FALSE, align=c("l", "l", "r", "r", "r", "r")) ``` ## Figure 1 - Association between tumour factors and 5 year mortality ```{r figure1, echo = FALSE} colon_s %>% or_plot(dependent, explanatory) ```

Now, edit the Word template. Click on a table. The `style`

should be `compact`

. Right click > `Modify... > font size = 9`

. Alter heading and text styles in the same way as desired. Save this as `template.docx`

. Upload to your project folder. Add this reference to the `.Rmd`

YAML heading, as below. Make sure you get the space correct.

The plot also doesn’t look quite right and it prints with warning messages. Experiment with `fig.width`

to get it looking right.

Now paste this into your `.Rmd`

file and run:

--- title: "Example knitr/R Markdown document" author: "Ewen Harrison" date: "21/5/2018" output: word_document: reference_docx: template.docx --- ```{r setup, include=FALSE} # Load data into global environment. library(finalfit) library(dplyr) library(knitr) load("out.rda") ``` ## Table 1 - Demographics ```{r table1, echo = FALSE, results='asis'} kable(table1, row.names=FALSE, align=c("l", "l", "r", "r", "r", "r")) ``` ## Table 2 - Association between tumour factors and 5 year mortality ```{r table2, echo = FALSE, results='asis'} kable(table2, row.names=FALSE, align=c("l", "l", "r", "r", "r", "r")) ``` ## Figure 1 - Association between tumour factors and 5 year mortality ```{r figure1, echo = FALSE, warning=FALSE, message=FALSE, fig.width=10} colon_s %>% or_plot(dependent, explanatory) ```

This is now looking good for me, and further tweaks can be made.

Default settings for PDF:

--- title: "Example knitr/R Markdown document" author: "Ewen Harrison" date: "21/5/2018" output: pdf_document: default --- ```{r setup, include=FALSE} # Load data into global environment. library(finalfit) library(dplyr) library(knitr) load("out.rda") ``` ## Table 1 - Demographics ```{r table1, echo = FALSE, results='asis'} kable(table1, row.names=FALSE, align=c("l", "l", "r", "r", "r", "r")) ``` ## Table 2 - Association between tumour factors and 5 year mortality ```{r table2, echo = FALSE, results='asis'} kable(table2, row.names=FALSE, align=c("l", "l", "r", "r", "r", "r")) ``` ## Figure 1 - Association between tumour factors and 5 year mortality ```{r figure1, echo = FALSE} colon_s %>% or_plot(dependent, explanatory) ```

Again, ok but not great.

We can fix the plot in exactly the same way. But the table is off the side of the page. For this we use the `kableExtra`

package. Install this in the normal manner. You may also want to alter the margins of your page using `geometry`

in the preamble.

--- title: "Example knitr/R Markdown document" author: "Ewen Harrison" date: "21/5/2018" output: pdf_document: default geometry: margin=0.75in --- ```{r setup, include=FALSE} # Load data into global environment. library(finalfit) library(dplyr) library(knitr) library(kableExtra) load("out.rda") ``` ## Table 1 - Demographics ```{r table1, echo = FALSE, results='asis'} kable(table1, row.names=FALSE, align=c("l", "l", "r", "r", "r", "r"), booktabs=TRUE) ``` ## Table 2 - Association between tumour factors and 5 year mortality ```{r table2, echo = FALSE, results='asis'} kable(table2, row.names=FALSE, align=c("l", "l", "r", "r", "r", "r"), booktabs=TRUE) %>% kable_styling(font_size=8) ``` ## Figure 1 - Association between tumour factors and 5 year mortality ```{r figure1, echo = FALSE, warning=FALSE, message=FALSE, fig.width=10} colon_s %>% or_plot(dependent, explanatory) ```

This is now looking pretty good for me as well.

There you have it. A pretty quick workflow to get final results into Word and a PDF.

]]>`finafit`

package brings together the day-to-day functions we use to generate final results tables and plots when modelling. I spent many years repeatedly manually copying results from R analyses and built these functions to automate our standard healthcare data workflow. It is particularly useful when undertaking a large study involving multiple different regression analyses. When combined with RMarkdown, the reporting becomes entirely automated. Its design follows Hadley Wickham’s tidy tool manifesto.
It lives on GitHub.

You can install `finalfit`

from github with:

# install.packages("devtools") devtools::install_github("ewenharrison/finalfit")

It is recommended that this package is used together with `dplyr`

, which is a dependent.

Some of the functions require `rstan`

and `boot`

. These have been left as `Suggests`

rather than `Depends`

to avoid unnecessary installation. If needed, they can be installed in the normal way:

install.packages("rstan") install.packages("boot")

To install off-line (or in a Safe Haven), download the zip file and use `devtools::install_local()`

.

`summary_factorlist()`

is a wrapper used to aggregate any number of explanatory variables by a single **variable of interest**. This is often “Table 1” of a published study. When categorical, the variable of interest can have a maximum of five levels. It uses `Hmisc::summary.formula()`

.

library(finalfit) library(dplyr) # Load example dataset, modified version of survival::colon data(colon_s) # Table 1 - Patient demographics by variable of interest ---- explanatory = c("age", "age.factor", "sex.factor", "obstruct.factor") dependent = "perfor.factor" # Bowel perforation colon_s %>% summary_factorlist(dependent, explanatory, p=TRUE, add_dependent_label=TRUE)

See other options relating to inclusion of missing data, mean vs. median for continuous variables, column vs. row proportions, include a total column etc.

`summary_factorlist()`

is also commonly used to summarise any number of variables by an **outcome variable** (say dead yes/no).

# Table 2 - 5 yr mortality ---- explanatory = c("age.factor", "sex.factor", "obstruct.factor") dependent = 'mort_5yr' colon_s %>% summary_factorlist(dependent, explanatory, p=TRUE, add_dependent_label=TRUE)

Tables can be knitted to PDF, Word or html documents. We do this in RStudio from a .Rmd document. Example chunk:

```{r, echo = FALSE, results='asis'} knitr::kable(example_table, row.names=FALSE, align=c("l", "l", "r", "r", "r", "r")) ```

The second main feature is the ability to create final tables for linear (`lm()`

), logistic (`glm()`

), hierarchical logistic (`lme4::glmer()`

) and

Cox proportional hazards (`survival::coxph()`

) regression models.

The `finalfit()`

“all-in-one” function takes a single dependent variable with a vector of explanatory variable names (continuous or categorical variables) to produce a final table for publication including summary statistics, univariable and multivariable regression analyses. The first columns are those produced by `summary_factorist()`

. The appropriate regression model is chosen on the basis of the dependent variable type and other arguments passed.

Of the form: `glm(depdendent ~ explanatory, family="binomial")`

explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") dependent = 'mort_5yr' colon_s %>% finalfit(dependent, explanatory)

Where a multivariable model contains a subset of the variables included specified in the full univariable set, this can be specified.

explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") explanatory_multi = c("age.factor", "obstruct.factor") dependent = 'mort_5yr' colon_s %>% finalfit(dependent, explanatory, explanatory_multi)

Of the form: `lme4::glmer(dependent ~ explanatory + (1 | random_effect), family="binomial")`

Hierarchical/mixed effects/multilevel logistic regression models can be specified using the argument `random_effect`

. At the moment it is just set up for random intercepts (i.e. `(1 | random_effect)`

, but in the future I’ll adjust this to accommodate random gradients if needed (i.e. `(variable1 | variable2)`

.

explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") explanatory_multi = c("age.factor", "obstruct.factor") random_effect = "hospital" dependent = 'mort_5yr' colon_s %>% finalfit(dependent, explanatory, explanatory_multi, random_effect)

Of the form: `survival::coxph(dependent ~ explanatory)`

explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") dependent = "Surv(time, status)" colon_s %>% finalfit(dependent, explanatory)

`metrics=TRUE`

provides common model metrics. The output is a list of two dataframes. Note chunk specification for output below.

explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") dependent = 'mort_5yr' colon_s %>% finalfit(dependent, explanatory, metrics=TRUE)

```{r, echo=FALSE, results="asis"} knitr::kable(table7[[1]], row.names=FALSE, align=c("l", "l", "r", "r", "r")) knitr::kable(table7[[2]], row.names=FALSE) ```

Rather than going all-in-one, any number of subset models can be manually added on to a `summary_factorlist()`

table using `finalfit_merge()`

. This is particularly useful when models take a long-time to run or are complicated.

Note the requirement for `fit_id=TRUE`

in `summary_factorlist()`

. `fit2df`

extracts, condenses, and add metrics to supported models.

explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") explanatory_multi = c("age.factor", "obstruct.factor") random_effect = "hospital" dependent = 'mort_5yr' # Separate tables colon_s %>% summary_factorlist(dependent, explanatory, fit_id=TRUE) -> example.summary colon_s %>% glmuni(dependent, explanatory) %>% fit2df(estimate_suffix=" (univariable)") -> example.univariable colon_s %>% glmmulti(dependent, explanatory) %>% fit2df(estimate_suffix=" (multivariable)") -> example.multivariable colon_s %>% glmmixed(dependent, explanatory, random_effect) %>% fit2df(estimate_suffix=" (multilevel)") -> example.multilevel # Pipe together example.summary %>% finalfit_merge(example.univariable) %>% finalfit_merge(example.multivariable) %>% finalfit_merge(example.multilevel) %>% select(-c(fit_id, index)) %>% # remove unnecessary columns dependent_label(colon_s, dependent, prefix="") # place dependent variable label

`stan`

Our own particular `rstan`

models are supported and will be documented in the future. Broadly, if you are running (hierarchical) logistic regression models in [Stan](http://mc-stan.org/users/interfaces/rstan) with coefficients specified as a vector labelled `beta`

, then `fit2df()`

will work directly on the `stanfit`

object in a similar manner to if it was a `glm`

or `glmerMod`

object.

Models can be summarized with odds ratio/hazard ratio plots using `or_plot`

, `hr_plot`

and `surv_plot`

.

# OR plot explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") dependent = 'mort_5yr' colon_s %>% or_plot(dependent, explanatory) # Previously fitted models (`glmmulti()` or # `glmmixed()`) can be provided directly to `glmfit`

# HR plot explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") dependent = "Surv(time, status)" colon_s %>% hr_plot(dependent, explanatory, dependent_label = "Survival") # Previously fitted models (`coxphmulti`) can be provided directly using `coxfit`

KM plots can be produced using the `library(survminer)`

# KM plot explanatory = c("perfor.factor") dependent = "Surv(time, status)" colon_s %>% surv_plot(dependent, explanatory, xlab="Time (days)", pval=TRUE, legend="none")

Use `Hmisc::label()`

to assign labels to variables for tables and plots.

label(colon_s$age.factor) = "Age (years)"

Export dataframe tables directly or to R Markdown `knitr::kable()`

.

Note wrapper `summary_missing()`

is also useful. Wraps `mice::md.pattern`

.

colon_s %>% summary_missing(dependent, explanatory)

Development will be on-going, but any input appreciated.

]]>`summarizer`

package on a server without internet access, such as the NHS Safe Havens.
- Uploadsummarizer-master.zip from here to server.
- Unzip.
- Run this:

library(devtools)

source = devtools:::source_pkg("

install(source)

```
```

As per comments, `devtools::install_local()`

has previously failed, but may now also work directly.

lme4::lmeris a useful frequentist approach to hierarchical/multilevel linear regression modelling. For good reason, the model output only includes

Yes, *p*-values are evil and we should continue to try and expunge them from our analyses. But I keep getting asked about this. So here is a simple bootstrap method to generate two-sided parametric *p*-values on the fixed effects coefficients. Interpret with caution.

library(lme4) # Run model with lme4 example data fit = lmer(angle ~ recipe + temp + (1|recipe:replicate), cake) # Model summary summary(fit) # lme4 profile method confidence intervals confint(fit) # Bootstrapped parametric p-values boot.out = bootMer(fit, fixef, nsim=1000) #nsim determines p-value decimal places p = rbind( (1-apply(boot.out$t<0, 2, mean))*2, (1-apply(boot.out$t>0, 2, mean))*2) apply(p, 2, min) # Alternative "pipe" syntax library(magrittr) lmer(angle ~ recipe + temp + (1|recipe:replicate), cake) %>% bootMer(fixef, nsim=100) %$% rbind( (1-apply(t<0, 2, mean))*2, (1-apply(t>0, 2, mean))*2) %>% apply(2, min)

]]>

In the meantime, there are limitations. Overwhelming evidence shows that the quality of reporting of prediction model studies is poor. In some instances, the details of the actual model are considered commercially sensitive and are not published, making the assessment of the risk of bias and potential usefulness of the model difficult.

In this edition of *HPB*, Beal and colleagues aim to validate the American College of Surgeons National Quality Improvement Program (ACS NSQIP) Surgical Risk Calculator (SRC) using data from 854 gallbladder cancer and extrahepatic cholangiocarcinoma patients from the US Extrahepatic Biliary Malignancy Consortium. The authors conclude that the “estimates of risk were variable in terms of accuracy and generally calculator performance was poor”. The SRC underpredicted the occurrence of all examined end-points (death, readmission, reoperation and surgical site infection) and discrimination and calibration were particularly poor for readmission and surgical site infection. This is not the first report of predictive failures of the SRC. Possible explanations cited previously include small sample size, homogeneity of patients, and too few institutions in the validation set. That does not seem to the case in the current study.

The SRC is a general-purpose risk calculator and while it may be applicable across many surgical domains, it should be used with caution in extrahepatic biliary cancer. It is not clear why the calculator does not provide measures of uncertainty around estimates. This would greatly help patients interpret its output and would go a long way to addressing some of the broader concerns around accuracy.

]]>This is extreme parenchymal-sparing hepatectomy (PSH) in 169 patients with colorectal liver metastases. In all cases, tumour was touching or infiltrating portal pedicles or hepatic veins, a situation where most surgeons would advocate a major hepatectomy where possible. The PSH by its nature results in a 0 mm resection margin when the vessel is preserved, which was the aim in many of these procedures. Although this is off-putting, the cut-edge recurrence rate was no higher than average.

PSH in the form of “easy atypicals” is performed by all HPB surgeons. There are two main differences here. First is the aim to detach tumours from intrahepatic vascular structures. For instance, hepatic veins in contact with tumour were preserved and only resected if infiltrated. Even then, they were tangentially incised if possible and reconstructed with a bovine pericardial patch. Second is the careful attention paid to identifying and using communicating hepatic veins. This is well described but used extensively here to allow complete resection of segments while avoiding congestion in the draining region.

Short-term mortality and morbidity rates are comparable with other published series. A median survival of 36 months and 5-year overall survival of around 30% is reasonable given some of these patients may not be offered surgery in certain centres. The authors describe the parenchymal sparing approach “failing” in 14 (10%) patients: 7 (5%) has recurrence at the cut edge and 8 (6%) within segments which would have been removed using a standard approach. 44% of the 55 patients with liver-only recurrence underwent re-resection.

This is not small surgery. The average operating time is 8.5 h with the longest taking 18.5 h. The 66% thoracotomy rate is also notable in an era of minimally invasive surgery and certainly differs from my own practice. This study is challenging and I look forward to the debates that should arise from it.

]]>In this issues of *HPB*, Lordan and colleagues report a propensity-score matched case-control series of PSH vs. major hepatectomy. The results are striking. The PSH approach was associated with less blood transfusion (10.1 vs 27.7%), fewer major complications (3.8 vs 9.2%), and lower rates of POHF (0 vs 5.5%). Unusually, perioperative mortality (0.8 vs 3.8%) was also lower in the PSH group and longer-term oncologic and survival outcomes were similar.

Results of propensity-matched analyses must always be interpreted with selection bias in mind. Residual confounding always exists: the patients undergoing major hepatectomy almost certainly had undescribed differences from the PSH group and may not have been technically suitable for PSH. Matching did not account for year of surgery, so with PSH becoming more common the generally improved outcomes over time will bias in favour of the parenchymal-sparing approach. Yet putting those concerns aside, there are two salient results. Firstly, PSH promises less POHF and in this series, there was none. Secondly, PSH promises greater opportunity for redo liver surgery. There was 50% liver-only recurrence in both groups. Although not reported by the authors, a greater proportion of PSH patients underwent redo surgery (35/119 (29.4%) vs. 23/130 (17.7%) (p=0.03). Perhaps for some patients, the PSH revolution is delivering some of its promised advantages.

]]>In *HPB*, Kim and colleagues examine survival after recurrence of bile duct cancer. The facts of this disease are always sobering: the median survival after diagnosis of recurrence is 7 months. The study is useful in that the authors have sufficient numbers to examine subgroups of those with recurrence to identify which patients may potentially benefit from salvage treatment (which was mostly chemotherapy). For those with poorly differentiated primary tumours, a short time to recurrence, poor performance status and elevated CA19-9, survival was only a handful of months.

This is a pragmatic non-randomised study with inherent selection bias, but the aim was not to determine the potential benefit of salvage treatment (we await the full publication of studies such as BILCAP). Also, the predictive ability of the model was not particularly high (c-statistic= 0.65). However, it does serve to illustrate the important point that for some very unfortunate patients with poor-prognosis recurrence, survival will be short and they may be better advised to focus on priorities other than chemotherapy. As Atul Gawande remarks in *Being Mortal*, “We’ve been wrong about what our job is in medicine. We think our job is to ensure health and survival. But really it is larger than that. It is to enable well-being.”