I’ve moved over to Bayesian methods and will post on mixed models using Stan soon. Thanks again.

]]>You can see this in the standard Spiegelhalter paper – http://www.medicine.cf.ac.uk/media/filer_public/2010/10/11/journal_club_-_spiegelhalter_stats_in_med_funnel_plots.pdf – particularly Figure 2 on page 1187. Spiegelhalter basically works backwards from exact binomial confidence intervals to define the control limits, while the APHO tool simply uses Wilson score confidence intervals.

The problem, as you do know, is that a CI is a statement of uncertainty about the population mean given the sample mean. Not, in general, the other way round.

The APHO spreadsheet is quite widely used, and so far our efforts to get it updated have not come to anything. One of my colleagues has been working on post-operative mortality, so she’s been putting a bit more effort into getting the APHO spreadsheet corrected; we even have a replacement Excel spreadsheet as of a couple of weeks ago. Now we just need to continue convincing other people until we can get it corrected.

A comparison with binomial limits might still be interesting, but certainly the more appropriate comparison would come from the model. Comparing to the overall mean is probably daft. If the model were fully adjusting for everything appropriately (in some magic manner), then comparing to carefully chosen target values could be interesting.

Aargh dissertations – actually, I’m also not too far off track myself, though it’s not quite as exciting as that.

]]>Interested in your line of thinking. The funnel plot control limits here are produced in a standard manner based on a population mean and as you know are simply represent the sampling distribution around that mean. Just as how Public Health England would do them http://www.apho.org.uk/default.aspx?RID=39403

However, they are definitely not correct for the purpose they are used for here, as the points are random effects estimates and so are shrunk towards the mean. With the full model, control limits could be simulated. More broadly, comparing the individuals to a population mean is probably not useful anyway. Have now got a full Bayesian model working with cross validation that is probably a more robust way of identifying divergent practice. Coming to a dissertation near you!

]]>I’m going to ignore the topic entirely and instead point out that the control limits on your funnel plot are incorrect: methods designed for making statements about the population mean based on the sample mean don’t work well in the other direction. You need to work backwards a bit instead, and perhaps take account of the discrete nature of the sample data (to get some really interesting and spiky `funnels’).

That said, it works well enough for a blogpost.

Yours pedantically,

Matt