That would be awful wouldn't it? Just as well it didn't happen.
I came back from visiting family on Sunday night to see the Telegraph story "Hospital records of all NHS patients sold to insurers" being tweeted. It was picked up by many other papers including the Guardian.
The story is about research done by a group of actuaries, the Critical Illness Definitions and Geographical Variations Working Party . You can find the full report here. It's a 200 page plus document written to explain to other actuaries how 'geodemographic' data might help predict how likely someone is to develop a critical illness. If you have ever applied for life insurance or critical illness cover or an income protection policy you would know that you your policy is priced based on what your risk is... your age, weight, smoking status, what illnesses you have. If you know anything about health inequalities you will know that beyond our own personal risk factors( age, weight etc) our social circumstances are important in determining how long we will live and if we will get sick. And you can tell a lot about our social circumstances from where we live. The relationship is so strong that there is a postcode mortality lottery Your postcode *might* reflect your lifestyle, your wealth, your education- all the things that predict how likely you are to get sick or to live long.
Geodemographics , CACI's Acorn, and Experian's Mosaic , classify postcodes into strange-sounding groups like 'happy families' and 'twilight subsistence' based on information obtained from public and commercial sources. The research by the actuaries was about whether these postcode classifications could predict when people developed serious illnesses. You can read the report to find out more but the short answer is that they do.
You may disagree with the idea that you postcode should be used to predict your risk to insurers. Is it a smart way of doing things? You can read some discussion of this in Tony Hirst's blog post here.
Most of the discussion was not about this though. It was about the fact that 'hospital records' were given to insurers.
Geodemographics , CACI's Acorn, and Experian's Mosaic , classify postcodes into strange-sounding groups like 'happy families' and 'twilight subsistence' based on information obtained from public and commercial sources. The research by the actuaries was about whether these postcode classifications could predict when people developed serious illnesses. You can read the report to find out more but the short answer is that they do.
You may disagree with the idea that you postcode should be used to predict your risk to insurers. Is it a smart way of doing things? You can read some discussion of this in Tony Hirst's blog post here.
Most of the discussion was not about this though. It was about the fact that 'hospital records' were given to insurers.
So what actually happened in the research?
"patients' medical histories, identified by date of birth and postcode, were combined with credit ratings data" http://t.co/oRhzGgazvt
— roger kline (@rogerkline) February 24, 2014
The above tweet by Roger quotes the Guardian's coverage of this story. Are Acorn and Mosaic 'credit ratings data'? Well, yes, they may have been originally developed to predict how likely you were to be able to pay back a loan. But as we can see they can also predict how likely you are to get sick or to die.
What did the hospital records look like? There were Hospital Episode Statistics. This is what the data looked like (from this presentation)
What did the hospital records look like? There were Hospital Episode Statistics. This is what the data looked like (from this presentation)
Is that what you thought the 'hospital records' would look like?
Did the researchers have full postcodes and dates of birth? It was a bit hard to tell this from the report. I presumed they didn't because I didn't see why they needed it. And I didn't think that the NHS was likely to give away information that would make it easy for individuals to be re-identified. But the full postcode was needed to be able to assign a 'geodemographic profile' to each person in the dataset. The following screenshot is from page 10 of the report.
Today after reading an article by Wired in which it is stated the hospital data was given to the Institute and Faculty of Actuarie IFOA and "was then combined with secondary sources, including Experian credit ratings data, in order to influence insurance premiums." I decided that I had to find out. So I phoned the press office of the Institute and Faculty of Actuaries (IFOA) on the number I found on the press release of their rebuttal to the Telegraph article.
I got straight through. The press officer directed me to page 10 above. I asked who had done the datalinking and they said it was the NHSIC. This made sense and fitted with their statement that they had no identifiable information for the individuals in the dataset. They only had an age group, and the 1st part of their postcode.
So how many people do you think contacted the IFOA to try and make the same clarifications as me? Every journalist that had written a story about this perhaps? No 3 people. The BBC and two bloggers. I was one of them.
Why didn't other journalists get in touch with them? Didn't they understand the significance of this? Didn't they care?
Did the researchers have full postcodes and dates of birth? It was a bit hard to tell this from the report. I presumed they didn't because I didn't see why they needed it. And I didn't think that the NHS was likely to give away information that would make it easy for individuals to be re-identified. But the full postcode was needed to be able to assign a 'geodemographic profile' to each person in the dataset. The following screenshot is from page 10 of the report.
I read this as meaning that the geodemographics were added to the HES dataset by the NHSIC who provided the dataset to the insurers. But others, including Tony Hirst, first read this as meaning that it was the researchers that did the datalinking. Who did the datalinking was important because who ever did it needed the full postcode.
Today after reading an article by Wired in which it is stated the hospital data was given to the Institute and Faculty of Actuarie IFOA and "was then combined with secondary sources, including Experian credit ratings data, in order to influence insurance premiums." I decided that I had to find out. So I phoned the press office of the Institute and Faculty of Actuaries (IFOA) on the number I found on the press release of their rebuttal to the Telegraph article.
I got straight through. The press officer directed me to page 10 above. I asked who had done the datalinking and they said it was the NHSIC. This made sense and fitted with their statement that they had no identifiable information for the individuals in the dataset. They only had an age group, and the 1st part of their postcode.
So how many people do you think contacted the IFOA to try and make the same clarifications as me? Every journalist that had written a story about this perhaps? No 3 people. The BBC and two bloggers. I was one of them.
Why didn't other journalists get in touch with them? Didn't they understand the significance of this? Didn't they care?
In the next few months and years we are going to be having many conversations about big data. We need to have journalists who know how to ask the right questions. And at the moment it looks as if we haven't.
If you think that the problem is that actuaries were given NHS data at all then see this.
EDIT In the past GPRD data was provided to actuaries. This is no longer the case although at least one application was made recently to CPRD. They rejected this.
If you think that the problem is that actuaries were given NHS data at all then see this.
EDIT In the past GPRD data was provided to actuaries. This is no longer the case although at least one application was made recently to CPRD. They rejected this.