Multiple imputation support in Finalfit

We are using multiple imputation more frequently to “fill in” missing data in clinical datasets. Multiple datasets are created, models run, and results pooled so conclusions can be drawn.

We’ve put some improvements into Finalfit on GitHub to make it easier to use with the mice package. These will go to CRAN soon but not immediately.

See finalfit.org/missing.html for more on handling missing data.

Let’s get straight to it by imputing smoking status in a cancer dataset.

Install

Create missing data for example

Check data

Multivariate Imputation by Chained Equations (mice)

miceis a great package and contains lots of useful functions for diagnosing and working with missing data. The purpose here is to demonstrate how mice can be integrated into the Finalfit workflow with inclusion of model from imputed datasets in tables and plots.

Choose variables to impute and variables to impute from

finalfit::missing_predictorMatrix()makes it easy to specify which variables do what. For instance, we often do not want to impute our outcome or explanatory variable of interest (exposure), but do want to use them to impute other variables.

This is straightforward to code using the arguments drop_from_imputed and drop_from_imputer.

Create imputed datasets

A set of multiple imputed datasets (mids) can be created as below. Various checks should be performed to ensure you understand the data that has been created. See here.

Run models

Here we sill use a logistic regression model. The with.mids() function takes a model with a formula object, so use base R functions rather than Finalfit wrappers.

We now have multiple models run with each of the imputed datasets. We haven’t found good methods for combining common model metrics like AIC and c-statistic. I’d be interested to hear from anyone working on methods for this. Metrics can be extracted for each individual model to give an idea of goodness-of-fit and discrimination. We’re not suggesting you use these to compare imputed datasets, but could use them to compare models containing different variables created using the imputed datasets, e.g.

Pool results

Rubin’s rules are used to combine results of multiple models.

Plot results

Pooled results can be passed directly to Finalfit plotting functions.

Put results in table

The pooled result can be passed directly to fit2df() as can many common models such as lm(), glm(), lmer(), glmer(), coxph(), crr(), etc.

Combine results with summary data

Any model passed through fit2df() can be combined with a summary table generated with summary_factorlist() and any number of other models.

Combine results with other models

Models can be run separately, or using the finalfit()wrapper including the argument keep_fit_it = TRUE.

Model missing explicitly in complete case models

A straightforward method of modelling missing cases is to make them explicit using the forcats function fct_explicit_na().

Export tables to PDF and Word

As described elsewhere, knitr::kable() can be used to export good looking tables.

Leave a Reply

Your email address will not be published. Required fields are marked *