Wishful thinking in medical education: #opendata

Showing posts with label #opendata. Show all posts

Thursday, 27 February 2014

Politicians badly briefed about data issues too?

At the end of the Parliamentary meeting on 'Patient rights and access to NHS data',
Parliamentary Under Secretary of State for Public Health, Jane Ellison MP made the following statement:

"I should actually just before we close put on the record Mr Emerson, forgive me, but I think it is useful for colleagues, just in regard to the Faculty of Actuaries and the data there, and I think actually the Shadow Minister also alluded to this; just to put on the public record that the data that they used was publicly available, non-identifiable and in aggregate form."

This data as described in this blog post was not publicly available or in aggregate form. It was individual-level data that had to be specifically asked for from the NHSIC.

Jane Ellison may have been referring to the report generated from the research done which is indeed publicly available, with aggregate analyses from which it would be impossible to identify an individual. But no one has concerns about the report.

You can watch her statement here at 16.16.30

Wednesday, 26 February 2014

We need data literate journalists.

So what do you think happened recently with NHS data? Do you think that the NHS handed over the records of millions of patients to insurers who then looked up their credit records and suggested that their insurance premiums should be changed?

That would be awful wouldn't it? Just as well it didn't happen.

I came back from visiting family on Sunday night to see the Telegraph story "Hospital records of all NHS patients sold to insurers" being tweeted. It was picked up by many other papers including the Guardian.

The story is about research done by a group of actuaries, the Critical Illness Definitions and Geographical Variations Working Party . You can find the full report here. It's a 200 page plus document written to explain to other actuaries how 'geodemographic' data might help predict how likely someone is to develop a critical illness. If you have ever applied for life insurance or critical illness cover or an income protection policy you would know that you your policy is priced based on what your risk is... your age, weight, smoking status, what illnesses you have. If you know anything about health inequalities you will know that beyond our own personal risk factors( age, weight etc) our social circumstances are important in determining how long we will live and if we will get sick. And you can tell a lot about our social circumstances from where we live. The relationship is so strong that there is a postcode mortality lottery Your postcode *might* reflect your lifestyle, your wealth, your education- all the things that predict how likely you are to get sick or to live long.

Geodemographics , CACI's Acorn, and Experian's Mosaic , classify postcodes into strange-sounding groups like 'happy families' and 'twilight subsistence' based on information obtained from public and commercial sources. The research by the actuaries was about whether these postcode classifications could predict when people developed serious illnesses. You can read the report to find out more but the short answer is that they do.

You may disagree with the idea that you postcode should be used to predict your risk to insurers. Is it a smart way of doing things? You can read some discussion of this in Tony Hirst's blog post here.

Most of the discussion was not about this though. It was about the fact that 'hospital records' were given to insurers.

So what actually happened in the research?

"patients' medical histories, identified by date of birth and postcode, were combined with credit ratings data" http://t.co/oRhzGgazvt
— roger kline (@rogerkline) February 24, 2014

The above tweet by Roger quotes the Guardian's coverage of this story. Are Acorn and Mosaic 'credit ratings data'? Well, yes, they may have been originally developed to predict how likely you were to be able to pay back a loan. But as we can see they can also predict how likely you are to get sick or to die.

What did the hospital records look like? There were Hospital Episode Statistics. This is what the data looked like (from this presentation)

Is that what you thought the 'hospital records' would look like?

Did the researchers have full postcodes and dates of birth? It was a bit hard to tell this from the report. I presumed they didn't because I didn't see why they needed it. And I didn't think that the NHS was likely to give away information that would make it easy for individuals to be re-identified. But the full postcode was needed to be able to assign a 'geodemographic profile' to each person in the dataset. The following screenshot is from page 10 of the report.

I read this as meaning that the geodemographics were added to the HES dataset by the NHSIC who provided the dataset to the insurers. But others, including Tony Hirst, first read this as meaning that it was the researchers that did the datalinking. Who did the datalinking was important because who ever did it needed the full postcode.

Today after reading an article by Wired in which it is stated the hospital data was given to the Institute and Faculty of Actuarie IFOA and "was then combined with secondary sources, including Experian credit ratings data, in order to influence insurance premiums." I decided that I had to find out. So I phoned the press office of the Institute and Faculty of Actuaries (IFOA) on the number I found on the press release of their rebuttal to the Telegraph article.

I got straight through. The press officer directed me to page 10 above. I asked who had done the datalinking and they said it was the NHSIC. This made sense and fitted with their statement that they had no identifiable information for the individuals in the dataset. They only had an age group, and the 1st part of their postcode.

So how many people do you think contacted the IFOA to try and make the same clarifications as me? Every journalist that had written a story about this perhaps? No 3 people. The BBC and two bloggers. I was one of them.

Why didn't other journalists get in touch with them? Didn't they understand the significance of this? Didn't they care?

In the next few months and years we are going to be having many conversations about big data. We need to have journalists who know how to ask the right questions. And at the moment it looks as if we haven't.

If you think that the problem is that actuaries were given NHS data at all then see this.
EDIT In the past GPRD data was provided to actuaries. This is no longer the case although at least one application was made recently to CPRD. They rejected this.

Saturday, 16 April 2011

Lies, damned lies and statistics: How do you turn 61% into 95%?

Image from "Working together for a stronger NHS" Crown Copyright

Edit: 13/05/11 An analysis of the BSA 2007 dataset by Siobhan Farmer, Mark Hawker and myself has been published today in the new publication Lancet UK Policy Matters. You can find it here. We conclude that it is not possible to conclude that 95% of respondents wished for more choice. Using the kind of suppositions given below it may be possible to infer that between 61% and 72% may have thought there should be more choice, but the survey was not designed to answer this question.

..........................................................................................................................................................

Edit: 17/4/11 I and a colleague have independently tried to verify Mark's analysis. We have reached similar conclusions but they don't corroborate Mark's results. Unfortunately he is currently outside the UK. We will update with our own results in the UK. We are in agreement that there is still no justification for claiming that "95% of people want more choice in healthcare".

..................................................................................................................................................................

Today many of you will have read Ben Goldacre's excellent analysis of the leaflet which the Department of Health issued last week to help the public understand why they were pursuing reforms of the NHS which are facing widespread opposition.

Page 11 of the leaflet contains the graphic above stating that 95% of those surveyed in the 25th British Social Attitudes Survey wanted MORE choice in the NHS. When Ben looked at the published reports he found that 'Do you want more choice in the NHS?' was not a question in the survey. Instead respondents were asked 'How much choice do you think you should have?' and 'How much choice do you actually have?' But if you could establish how many people thought they should have choice but who currently think that they don't have choice then you might be able to say how many people think that they should have more choice. As Ben points out to answer this you would need to have individual level data. When he asked the DOH for the dataset they unfortunately pointed him to a book chapter.

But fortunately the day this leaflet was published, April 6th, 2011, Mark Hawker started wondering if he could find out more about this dataset. And he did find more. The individual level data is available to download. So Mark did that. Then he analysed it. And what did he find? Well, you can read a lot more in the blog post that he published on April 7th but here is a summary.

13% thought they had more choice that they thought they should have.

46% thought they had just the right amount of choice.

41% thought they should have more choice that they had.

So how did the DOH manage to get this so wrong? How did they confuse 41% with 95%? Why weren't they able to direct Ben Goldacre to the correct data source? And why have they decided not to fund this survey in the future?

Hopefully someone can help make the correct data look just as pretty as the incorrect infographic in the leaflet. In the meantime I think I'd like to thank Mark for his work, and to agree with this tweet: