UPDATE: 23.01.15 The ICO has responded [.doc file] to my request for a review of their decision. I drew their attention to the arguments on this page but they don’t even mention them, let alone provide a counter-analysis, in dismissing my complaints (“Having reviewed the matter, I agree with the explanations provided”). I am invited by the ICO to consider taking my own legal action. I understand that the ICO and I might have differing views on a DPA matter, but what I find difficult to accept is the refusal even to enter into a discussion with me about the detailed arguments I’ve made. END UPDATE
In February this year I asked the Information Commissioner’s Office (ICO) to investigate reports that Hospital Episode Statistics (HES) data had apparently been sold to an actuarial society by the NHS Information Centre (NHSIC), the predecessor to the Health and Social Care Information Centre (HSCIC). Specifically I requested, as a data subject can under s42 of the Data Protection Act 1998 (DPA), that the ICO assess whether it was likely or not that the processing of my personal data by NHSIC and others had been in compliance with the DPA.
Nine months later, I was still awaiting the outcome. But a clue to how the assessment would turn out was contained in the text of Sir Nick Partridge’s six month review of various data releases by NHSIC (his original report in June seemed to me to point to multiple potential DPA contraventions). In the review document he says
Six investigations have been separately instigated by the HSCIC or Information Commissioner’s Office (ICO)and shared with both parties as these focussed on whether individuals were at risk of being identified. In the cases it has investigated, the ICO has upheld the HSCIC approach and informed us that it has “seen no evidence to suggest that re-identification has occurred or is reasonably likely to occur.”
Following the recent issue regarding HSCIC, PA Consulting, and Google we investigated the issue of whether HES data could be considered personal data. This detailed work involved contacting HSCIC, PA Consulting, and Google and included the analysis of the processes for the extraction and disclosure of HES data both generally and in that case in particular. We concluded that we did not consider that the HES dataset constitutes personal data.Furthermore we also investigated whether this information had been linked to other data to produce “personal data” which was subject to the provisions of the Act. We have no evidence that there has been any re-identification either on the part of PA Consulting or Google. We also noted that HSCIC have stated that the HES dataset does not include individual level patient data even at a pseudonymised level. Our view is that the data extracted and provided to PA Consulting did not identify any individuals and there was no reasonable likelihood that re-identification would be possible.
Recital 26 signifies that to anonymise any data, the data must be stripped of sufficient elements such that the data subject can no longer be identified. More precisely, that data must be processed in such a way that it can no longer be used to identify a natural person by using “all the means likely reasonably to be used”
Anonymisation has also been subject to judicial analysis, notably in the Common Services Agency case, but, even more key, in the judgment of Mr Justice Cranston in Department of Health v Information Commissioner ( EWHC 1430). The latter case, involving the question of disclosure of late-term abortion statistics, is by no means an easy judgment to parse (ironically so, given that it makes roughly the same observation of the Common Services Agency case). The judge held that the First-tier Tribunal had been wrong to say that the statistics in question were personal data, but that it had on the evidence been entitled to say that “the possibility of identification by a third party from these statistics was extremely remote”. The fact that the possibility of identification by a third party was extremely remote meant that “the requested statistics were fully anonymised” (¶55). I draw from this that for personal data to be anonymised in statistical format the possibility of identification of individuals by a third party must be extremely remote. The ICO’s Anonymisation Code, however, says of the case:
The High Court in the Department of Health case above stated that the risk of identification must be greater than remote and reasonably likely for information to be classed as personal data under the DPA [emphasis added]
But this seems to me to be an impermissible description of the case – the High Court did not state what the ICO says it stated – the phrases “greater than remote” and “reasonably likely” do not appear in the judgment. And that phrase “reasonably likely” is one that, as I say, makes it way into the Partridge Review, and the ICO’s assessment of the lawfulness of HES data “sale”.
I being to wonder if the ICO has taken the phrase from recital 26 of the Directive, which talks about the need to consider “all the means likely reasonably to be used” to identify an individual, and transformed it into a position from which, if identification is not reasonably likely, it will accept that data are anonymised. This cannot be right: there is a world of difference between a test which considers whether possibility of identification is “extremely remote” and whether it is “reasonably likely”.
I do not have a specific right to a review of the section 42 assessment decision that the processing of my personal data was likely in compliance with NHSIC’s obligations under the DPA, but I have asked for one. I am aware of course that others complained (après moi, la deluge) notably, in March, FIPR, MedConfidential and Big Brother Watch . I suspect they will also be pursuing this.
In October this year I attended an event at which the ICO’s Iain Bourne spoke. Iain was a key figure in the drawing up of the ICO’s Anonymisation Code, and I took the rather cheeky opportunity to ask about the HES investigations. He said that his initial view was that NHSIC had been performing good anonymisation practice. This reassured me at the time, but now, after considering this question of whether the Anonymisation Code (and the ICO) adopts the wrong test on the risks of identification, I am less reassured. Maybe “reasonably likely that an individual can be identified” is an appropriate test for determining when data is no longer anonymised, and becomes personal data, but it does not seem to me that the authorities support it.
Postscript Back in August of this year I alerted the ICO to the fact that a local authority had published open data sets which enabled individuals to be identified (for instance, social care and housing clients). More than four months later the data is still up (despite the ICO saying they would raise the issue with the council): is this perhaps because the council has argued that the risk of identification is not “reasonably likely”?
The views in this post (and indeed all posts on this blog) are my personal ones, and do not represent the views of any organisation I am involved with.
3 responses to “The wrong test for anonymisation?”
Only nine months, you must me patient because it has taken me nearly 6 years to get the ICO to the Court of Appeal ref the infamous Dransfield vexatious case.
Watch this space for updates
I agree that there’s a world of difference between the two tests, but in my opinion the difference goes even further than what you describe. It’s not just that there is a grey area between re-identification being ‘reasonably likely’ and ‘extremely remote’. The first test, as I interpret it, talks about the need to consider *all of the means* likely reasonably to be used, and then based on that analysis, come to a judgement about the overall likelihood of re-identification.
In other words, there are a range of possible re-identification techniques that might be applied, each of which have a certain probability of success, and you need to a) assess which of these are likely to be applied and be successfull, then b) adding the probability of these various re-identification scenarios together to give an overall probability of re-identification. Obviously putting hard numbers on these scenarios might not make sense but conceptually, that’s an approach that makes sense to me.
Thanks Reuben, and yes, I’d agree. I think the Anonymisation Code is generally quite good, and does deal with the fact that the techniques involved in anonymisation and reidentification are various. But the “reasonably likely” point potentially undermines it.