### Landmark Papers in General Surgery: Review

A longer version of my review in Surgeons’ News.

When should a clinical study be considered a landmark? Must it have changed practice? Does the strength of the study have a bearing – should only randomised clinical trials be considered, for instance? The new Landmark Papers series from Oxford University Press has volumes in Neurosurgery, Cardiovascular Medicine and Nephrology. A book covering General Surgery from authors based mainly in Glasgow is hot off the press.

The editors have done a great job in producing a clean, well-structured, easy to read book that will be of use to both practising surgeons and trainees. The book is divided by general surgery subspecialty with each chapter containing a number of themes. In emergency surgery, for instance, sections include CT assessment of the acute abdomen and laparoscopic versus open appendicectomy. An important study addressing the theme is provided, sometimes together with related references. Following a brief description, study design and results are tabulated, after which conclusions and a critique are made.

Before opening the book I wondered whether there may be a problem in its conception: in the modern world of the systematic review and meta-analysis, what is the place of a book in which surgeons highlight a single publication in a deliberately unsystematic manner? Is this not harking back to the days when one cites evidence fitting ones prejudices, ignoring troublesome contradictory reports?

Actually, rather than a problem, I found this refreshing. This analysis of individual trials in a detailed manner is reminiscent of the journal clubs we struggle to maintain in our busy modern practice. Despite being an advocate for the systematic review, too often the focus is on the certainty surrounding a point estimate of outcome. This book highlights the importance of clear consideration of the intention of a trial, whether those aims were achieved, what biases exist and ultimately whether the results apply to my patients or not. In any case, in areas where conflict exists, multiple trials are often described and the balance of interpretation discussed in the critique.

Another concern was that it would date almost as soon as it was published. With 140 000 citations being returned from the Pubmed database for an all-fields search for “surgery” in 2012, how can a static publication like this hope to remain relevant? Again, on the whole this concern was unfounded. A condensation of the evidence for surgery, such as this, shows that the pace of change is slower than we possibly recognise. While the majority of included trials are from the last 15 years, there are fewer than I expected from, say, the last 4 years.

A publication such as this puts itself up there to be criticised for the omission of studies deemed important by a reviewer, and it would be remiss of me not to comply. Actually, the editors have done a good job and irritatingly I found it difficult to identify big omissions. On pulling up the top 50 most cited papers in surgery, I found the great majority had been included. In my own (small field), the landmark paper by Mazzaferro on the surgical treatment of hepatocellular carcinoma has been cited more than most other surgical papers (2400 times) and warrants inclusion. The classification of surgical complications by Dindo and Clavien is at number 11 in my top 50 and probably deserves mention.

The editors have achieved their aim with this book and I would recommend it unreservedly. Minor niggles are the truncated “et al” citations – give us the whole citation so we can see the senior author please. No graphs are included which is fine, but where the main study is a meta-analysis, including the forest plot for the primary outcome measure conveys information better than a table. Finally, is there a digital version of this book? I circled the Oxford website in vein but could not find a page where it is possible to buy one.

Must a landmark paper have changed practice? No, as illustrated by the neat discussion on the GALA (general versus local anaesthetic in carotid endarterectomy) trial – an example of a landmark randomised trial that has not changed practice. Must a landmark paper be an RCT? No, as the classic level 4 evidence for total mesorectal excision by Bill Heald demonstrates – some observational studies have done more to alter practice in surgery than many RCTs.

### Hepatitis C virus, tumour and liver transplantation

From my HPB highlights this month.

Do patients with hepatocellular carcinoma (HCC) on a background of hepatitis C virus (HCV) have worse outcomes after liver transplantation than non-HCV patients? This relatively straightforward question continues to vex and published studies are contradictory. Molecular features of HCC which are associated with aggressive behaviour are up-regulated in the presence of HCV, providing a biological mechanism to support the hypothesis. The theory is borne out in early single centre studies, but the largest published analysis using the United Network for Organ Sharing database published by Thuluvath in 2009 contradicted these. HCV+ patients were shown to have a lower survival rate than HCV- patients, regardless of their HCC status. This is to be expected. However, HCV had no additional negative impact on survival in patients with HCC

In this edition of HPB, Dumitra and colleagues describe a further single-centre study from Montreal. They conclude that HCC+/HCV+ patients have a significantly worse outcome than those with HCC or HCV alone. So why the contradiction? It may be that length of follow-up is important. This study provides survival curves out to 10 years. A cluster of deaths after 5 years in the HCV+/HCC+ group results in a significantly worse outcome in this group, although the number-at-risk are low. However, loss to follow-up is an unusually low 1.2% and explant pathology is available for almost all patients – detail not often available in studies using administrative databases. In a multivariable analysis controlling for recipient age, gender, MELD score and donor risk index (DRI), the combined effect of HCC+/HCV+ gives a hazard twice that of HCC+/HCV-.

HCV graft infection after liver transplantation is universal and the course of recurrent cirrhosis accelerated. Controlling HCV recurrence with newer antiviral agents will improve long-term survival and this study suggests the possibility of additional benefits in HCC+/HCV+ patients. Other modifiable variables such as donor age and DRI are unlikely to have an impact, given HCC patients rarely have the luxury of a wide choice of donor grafts.

### A map of the world by tweets

With geo-tagging enabled, tweets include information on the location of the user when the tweet was sent. Miguel Rios (@miguelrios) has plotted locations of billions of tweets to create maps of the world. This is pretty amazing stuff – a world map rendered just from twitter posts!

Maps are created using every tweet from 2009 using R and the ggmap package.

Post is here with more here on flickr.

### Mickey Mouse and the tubes connecting the liver

In liver surgery, it’s often important to know the exact layout of the connections the liver has to the rest of the body. Here are some images which hopefully make it clear. The liver is unusual because it has two blood supplies. The first is an an artery, the hepatic artery, which carries oxygen to the liver. The other is the portal vein which carries blood from the guts to the liver and contains the nutrients from food. The portal vein carries 3 times as much blood as the artery and is not to be messed with – 34% of patients with a portal vein injury do not survive.

The other important tube is the bile duct. This drains bile from the liver to the guts. If it gets blocked – by a gallstone or cancer – the patient becomes jaundiced (the skin going yellow).

We use an ultrasound machine to visualise the vessels and the bile duct. It can be tricky and difficult to interpret. The boss has a good technique for getting orientated – the Mickey Mouse sign. When seen in the transverse plane – imagine sitting at the patient’s feet looking up through the body towards the head – the large portal vein with the artery and bile duct in front looks like Mickey. I use this technique every time.

### Tweets of Surgical Colleges – what does it say about them?

What do the UK and Ireland Surgical Royal Colleges tweet about and how do they compare to the American College of Surgeons?

Twitter allow retrieval of the last 3200 tweets of a given user. Here are all tweets ever sent by the Royal Colleges a few days ago. The American College has tweeted over 6000 times, so only the latest 3200 are included. The Glasgow College is just getting going.

There is a bit of processing first. Charts are generated after removal of “stop words” – all the little words that go in between. Words then have common endings removed (e.g. -ing; stemming) and the most common ending for the group replaced (stem completion).

So what can be said? I was interested in whether Colleges tweet about training. I was pleased to see that the UK colleges do – a fair amount. Terms that are associated with training were less apparent in tweets from the RCSI and ACS.

The figures below show clustering of terms within tweets, with term frequency increasing from left to right. There are some nice themes that emerge. In the RCSEng tweets there are themes relating to “training”, “events”, “working time”, and “the NHS”.  Similar subjects are apparent in RCSEd tweets, with prominence of their medical students surgical skill competition and issues specifically relating to the NHS in Scotland. As the RCPSG have only started tweeting, associations are greatly influenced by individual tweets.  The RCSI’s “Transition Year Mini Med School Programme” “MiniMed School Open Lecture Series” (updated 22/04/13) can be seen together with conference promotion. The ACS appear to use Twitter to communicate issues relating to patient health improvement programmes more prominently than other Colleges.

Network plots illustrate the strength of association of terms (weight of edges) and frequency of terms (font size of vertices). Do the terms in these plots represent the core values of these organisations?

### Publication of paediatric cardiac surgery results

The National Institute for Cardiovascular Outcomes Research (NICOR) has published the results of its investigation into mortality after paediatric heart surgery in England 2009-12.

The short report has two main findings – the quality of data collection at Leeds General Infirmary (LGI) was woeful, and differences in mortality between all hospitals are likely to be explained by natural variation.

The ability of an institution to collect and audit its own results can be viewed as a measure of organisational health. As can be seen in the table, the performance of LGI in this respect was terrible, and much worse than other units. A cause for concern in itself.

On the more controversial point of whether the mortality rate in LGI was worse than other centres, no convincing proof of this has been found.

The funnel plot below shows the number of expected deaths along the bottom. Centres performing greater numbers of procedures have a greater number of expected deaths, just by common sense.

These numbers have been corrected for the difference in the types of patients and surgery performed in hospitals – the specific procedure performed, patient age, weight, diagnosis, and previous medical conditions. All these factors impact on the risk of death following surgery.

Any hospital above the black horizontal line has a greater number of deaths than predicted and any hospital below has fewer.

By “the law of averages”, it would be expected that there was a roughly equal spread of hospitals above and below the line.

As can be seen, Alder Hey, Guys, and LGI are all close to triggering an “alert”.

The report rightly states that these units “may deserve additional scrutiny and monitoring of current performance”.

The 3-year risk adjusted mortality rate in LGI is 1.47 times the national average – lower than the “twice the national average” first reported.

The unambiguous message? Data collection and real-time analysis is core business in healthcare. Government and the NHS still do not have a grip of this. There are many more stories of significant differences between hospitals, hidden in poor quality data that no one is looking at.

### Mortality after paediatric heart surgery using public domain data

This post comes with some big health warnings.

The recent events in Leeds highlight the difficulties faced in judging the results of surgery by individual hospital. A clear requirement is timely access to data in a form easily digestible by the public.

Here I’ve scraped the publically available data from the central cardiac audit database (CCAD). All the data are available at the links provided and are as presented this afternoon (06/04/2013). Please read the caveats carefully.

Hospital-specific 30-day mortality data are available for certain paediatric heart surgery procedures for 2009-2012. These data are not complete for 2011-12 and there may be missing data for earlier years. There may be important data for procedures not included here that should be accounted for. There is no case-mix adjustment.

All data are included in spreadsheets below as well as the code to run the analysis yourself, to ensure no mistakes have been made. Hopefully these data will be quickly superseded with a quality-assured update.

## Mortality by centre

The funnel plot below has been generated by taking all surgical procedures performed from pages such as this and expressing all deaths within 30 days as a proportion of the total procedures performed by hospital.

The red horizontal line is the mean mortality rate for these procedures – 2.3%. The green, blue and red curved lines are decreasingly stringent control limits within which unit results may vary by chance.

## Mortality by procedure

The mortality associated with different procedures can be explored with this google motion chart. Note when a procedure is uncommon (to the left of the chart) the great variation seen year to year. These bouncing balls trace out the limits of a funnel plot. They highlight why year to year differences in mortality rates for rare procedures must be interpreted with caution.

## Script

### Two simple tests for summary data

Here’s two handy scripts for hypothesis testing of summary data. I seem to use these a lot when checking work:

• Chi-squared test of association for categorical data.
• Student’s t-test for difference in means of normally distributed data.

The actual equations are straightforward, but get involved when group sizes and variance are not equal. Why do I use these a lot?!

I wrote about a study from Hungary in which the variability in the results seemed much lower than expected. We wondered whether the authors had made a mistake in saying they were showing the standard deviation (SD), when in fact they had presented the standard error of the mean (SEM).

This is a bit of table 1 from the paper. It shows the differences in baseline characteristics between the treated group (IPC) and the active control group (IP). In it, they report no difference between the groups for these characteristics, p>0.05.

But taking “age” as an example and using the simple script for a Student’s t-test with these figures, the answer we get is different. Mean (SD) for group A vs. group B: 56.5 (2.3) vs. 54.8 (1.8), t=4.12, df=98, p=<0.001.

There are lots of similar examples in the paper.

Using standard error of the mean rather than standard deviation gives a non-significant difference as expected.

$SEM=SD/\sqrt{n}.$

See here for how to get started with R.

### Statistical errors in published medical studies

I do a fair amount of peer-review for journals. My totally subjective impression – which I can’t back up with figures – is that fundamental errors in data analysis occur on a fairly frequent basis. Opaque descriptions of methods and no access to raw data often makes errors difficult to detect.

We’re performing a meta-analysis at the moment. This is a study in which two or more clinical trials of the same treatment are combined. This can be useful when there is uncertainty about the effectiveness of a treatment.

Relevent trials are rigorously searched for and the quality assessed. The results of good quality trials are then combined, usually with more weight being given to the more reliable trials. This weight reflects the number of patients in the trial and, for some measures, the variability in the results. This variation is important – trials with low variability are greatly influential in the final results of the meta-analysis.

What are we doing the meta-analysis on? We often operate to remove a piece of liver due to cancer. Sometimes we have to clamp the blood supply to the liver to prevent bleeding. An obvious consequence to this is damage to the liver tissue.

It may be possible to protect the liver (and any organ) from these damaging effects by temporarily clamping the blood supply for a short time, then releasing the clamp and allowing blood to flow back in. The clamp is then replaced and the liver resection performed. This is called “ischemic preconditioning” and may work by stimulating liver cells to protect themselves. “Batten down the hatches boys, there’s a storm coming!”

Results of this technique are controversial – when used in patients some studies show it works, some show no benefit. So should we be using it in our day-to-day practice?

We searched for studies examining ischemic preconditioning and found quite a few.

One in particularly performed by surgeons in Hungary seemed to show that the technique worked very well (1).The variability in this study was low as well, so it seemed reliable. Actually the variability was very low – lower than all the other trials we found.

The graph shows 3 of the measures used to determine success of the preconditioning. The first two are enzymes released from damaged liver cells and the third, bilirubin, is processed by the liver. All the studies show some lowering of these measures signifying potential improvement with the treatment. But most trials show a lot of variation between different patients (the vertical lines).

Except a Hungarian study, which shows almost no variation.

Even compared with a study in which these tests were repeated between healthy individuals in the US (9), the variation was low. That seemed strange. Surely the day-to-day variation in your or my liver tests should be lower than those of a group of patients undergoing surgery?

It looks like a mistake.

It may be that the authors wrote that they used one measure of variation when they actually used another (standard error of the mean vs. standard deviation). This could be a simple mistake, the details are here.

But we don’t know. We wrote three times, but they didn’t get back to us. We asked the journal and they are looking into it.

1 Hahn O, Blázovics A, Váli L, et al. The effect of ischemic preconditioning on redox status during liver resections-randomized controlled trial. Journal of Surgical Oncology 2011;104:647–53.
2 Clavien P-A, Selzner M, Rüdiger HA, et al. A Prospective Randomized Study in 100 Consecutive Patients Undergoing Major Liver Resection With Versus Without Ischemic Preconditioning. Annals of Surgery 2003;238:843–52.
3 Li S-Q, Liang L-J, Huang J-F, et al. Ischemic preconditioning protects liver from hepatectomy under hepatic inflow occlusion for hepatocellular carcinoma patients with cirrhosis. World J Gastroenterol 2004;10:2580–4.
4 Choukèr A, Martignoni A, Schauer R, et al. Beneficial effects of ischemic preconditioning in patients undergoing hepatectomy: the role of neutrophils. Arch Surg 2005;140:129–36.
5 Petrowsky H, McCormack L, Trujillo M, et al. A Prospective, Randomized, Controlled Trial Comparing Intermittent Portal Triad Clamping Versus Ischemic Preconditioning With Continuous Clamping for Major Liver Resection. Annals of Surgery 2006;244:921–30.
6 Heizmann O, Loehe F, Volk A, et al. Ischemic preconditioning improves postoperative outcome after liver resections: a randomized controlled study. European journal of medical research 2008;13:79.
7 Arkadopoulos N, Kostopanagiotou G, Theodoraki K, et al. Ischemic Preconditioning Confers Antiapoptotic Protection During Major Hepatectomies Performed Under Combined Inflow and Outflow Exclusion of the Liver. A Randomized Clinical Trial. World J Surg 2009;33:1909–15.
8 Scatton O, Zalinski S, Jegou D, et al. Randomized clinical trial of ischaemic preconditioning in major liver resection with intermittent Pringle manoeuvre. Br J Surg 2011;98:1236–43.
9 Lazo M, Selvin E, Clark JM. Brief communication: clinical implications of short-term variability in liver function test results. Ann Intern Med 2008;148:348–52.

### Leeds paediatric heart surgery: managing outliers

Childrens’ heart surgery in Leeds has been suspended. Concerns about an excess in mortality have been raised and denied and I have written about seemingly large variations in mortality (“twice the national average”) being explained by chance.

In June 1998, the then Secretary of State for Health announced the establishment of an inquiry into the management of the care of children receiving complex cardiac surgery at Bristol Royal Infirmary between 1984 and 1995. The inquiry identified failures that contributed to the death children undergoing heart surgery and the 529-page report was a blueprint for wider reform of the NHS.

Funnel plots are useful for comparing the results of surgery between hospitals. The funnel plots below are from here and are for open cardiac surgery in children under one year in the UK 1991-1995. The Cardiac Surgery Registry (CSR) and Hospital episode statistics (HES) data were used to compare institutions. The horizontal dotted line is the national average and curved dotted line the limit of variation which might be expected by chance (95% confidence interval). The “O” is Bristol Royal Infirmary and “*” the eleven other UK centres. Bristol, as became apparent, was a clear outlier.

## How should we deal with outliers?

The question is pertinent given the recent suspension of Leeds Royal Infirmary from performing children’s cardiac surgery. The UK Department of Health has produced guidelines in 2011 on the recommended process should a unit hit the dotted line, summarised below.

### Stage 1 | 10 days

Hospitals with a performance indicator ‘alert’ or ‘alarm’ require scrutiny of the data handling and analyses performed to determine whether there is:

• potential outlier status not confirmed;
• data and results revised in clinical audit records;
• details formally recorded.

• potential outlier status;
• proceed to stage 2.

### Stage 2 | 5 days

The Lead Clinician in the hospital is informed about the potential outlier status and requested to identify any data errors or justifiable explanations. All relevant data and analyses should be made available to the Lead Clinician.

A copy of the request should also be sent to the Clinical Governance Lead of the hospital.

### Stage 3 | 25 days

Lead Clinician to provide written response to national clinical audit team.

### Stage 4 | 30 days

Review of Lead Clinician’s response to determine:

• It is confirmed that the data originally supplied by the provider contained inaccuracies. Reanalysis of accurate data no longer indicate outlier status;
• Data and results should be revised in clinical audit records. Details of the hospital’s response and the review result recorded;
• Lead Clinician notified in writing.

• It is confirmed that although the data originally supplied by the provider were inaccurate, analysis still indicates outlier status; or
• It is confirmed that the originally supplied data were accurate, thus confirming the initial designation of outlier status;
• proceed to stage 5.

### Stage 5 | 5 days

Contact Lead Clinician by telephone, prior to written confirmation of potential outlier status; copied to clinical governance lead, medical director and chief executive. All relevant data and statistical analyses, including previous response from the lead clinician, made available to the medical director and chief executive.

Chief executive advised to inform relevant bodies about the concerns: primary care trusts, Strategic Health Authority, professional society/association, and Care Quality Commission. Informed that the audit body will proceed to publishing information of comparative performance that will identify providers.

### Stage 6 | 10 days

Chief executive acknowledgement of receipt of the letter.

### Stage 7

Public disclosure of comparative information that identifies providers (eg annual report of NCA).

## The Situation in Leeds

It appears that in Leeds the process is at stage 2 – the local doctors have just been informed. The guidance suggests the identity of the statistical outliers should be anonymous at this stage. It may be that concerns were so great that special circumstances dictated the dramatic public announcement. We should find out in the next few weeks.

### R function to retrieve pubmed citations from pmid number

This is useful number if you have hundreds of PMIDs and need specific fields from the pubmed/medline citation.