Sensitive personal data exposed in Open Datasets

Since August last year I’ve been inviting the ICO to consider the issue of deliberate wholesale exposure of sensitive personal data in local authority open data. It’s still online.

UPDATE: 16.02.15 Well, I was wrong. The ICO says this is not personal data:

The data sets in question are clearly personal data in the hands of the [redacted] because it will retain the full original dataset containing the identifying details of individuals. However, the question is whether the information is still personal data post-publication. In our view it is not.

Although the data relates to particular living individuals, it does not in itself identify any of them and so in of themselves the data sets do not contain personal data.

The issue then is whether it is likely that a third party will come into the possession of other information that will allow an individual to be identified. To do so, such a person would already need prior knowledge of any given individual in order to identify them. However, we believe the publication of the information to have a low risk of individuals being re-identified because only someone with considerable prior knowledge would be able to perform this task.

We note that you have not identified anybody and the [redacted] has stated that it is unaware of any cases of re-identification as a result of publication.

I honestly struggle to fathom this. I accept the ICO’s further point that

the Information Commissioner only has to give his view about the likelihood that there has been a breach [of the DPA]. This view is made on the balance of probabilities and the Commissioner is under no obligation to prove this beyond doubt

but their assessment doesn’t seem to tally with my understanding of the techniques described in the ICO’s own Anonymisation Code. I’m no expert on that subject, but I wouldn’t dream of publishing the datasets in question, in the form they have been published. If anyone has any observations I’d be really interested to hear them.

And I’m still not linking to the datasets – I think they can identify individuals, and their sensitive personal data.


Imagine, if you will, a public authority which decides to publish as Open Data a spreadsheet of 6000 individual records of adults receiving social services support. Each row tells us an individual service user’s client group (e.g. “dementia” or “learning disability”), age range (18-64, 65-84, 84 and over), the council ward they live in, the service they’re receiving (e.g. “day care” or “direct payment” or “home care”), their gender and their ethnicity. If, by burrowing into that data, one could identify information that reveals that one, and only one, Bangladeshi man in the Blankety ward aged 18-64 with a learning disability is in receipt of direct payments, most data protection professionals (and many other people besides) would recognise that this is an identifiable individual, if not to you or me, then almost certainly to some of his neighbours or family or acquaintances.

Similarly, imagine the same public authority decides to publish as Open Data a spreadsheet of nearly 7000 individual records of council housing tenants who have received Notices of Seeking Possession or Notices to Quit. Each row tells us the date individual tenant was served the notice, the council ward, the duration of the tenancy, whether it was joint or sole, the age of the tenant(s) in years, their gender, their ethnicity (if recorded), their disability status (if recorded), their vulnerability status (if recorded). If, by burrowing into that data, one could identify that reveals that one, and only one, 40-year-old Asian Indian male sole tenant with a tenancy 2.94 years old, was served a Notice of Seeking of Possession in June 2006, most data protection professionals (and many other people besides) would recognise that this is an identifiable individual, if not to you or me, then almost certainly to some of his neighbours or family or acquaintances.

If these individuals are identifiable (and, trust me, these are only two examples from hundreds, in many, many spreadsheets), then this is their sensitive personal data which is being processed by the public authority in question (which I am not identifying, for obvious reasons). For the processing to be fair and lawful it needs a legal basis, by the meeting of at least one of the conditions in Schedule Two and one in Schedule Three of the Data Protection Act 1998 (DPA).

And try as I might, I cannot find one which legitimises this processing, not even in the 2000 Order which significantly added to the Schedule 3 conditions. And this was why, when the datasets in question were drawn to my attention, I flagged my concerns up with the public authority

Hi – I notice you’ve uploaded huge amounts of data…some of it at a very high level of granularity – ie with multiple and specific identifiers. According to the definitions in recital 26 and Article 2 of Directive 95/46/EC, s1(1) of the Data Protection Act 1998, and the Information Commissioner’s Office guidance (eg “Determining What is Personal Data” and the Code of Practice on Anonymisation) this is very likely to be personal data and in many cases sensitive personal data. I’m curious to know why you are publishing such datasets in such form, and what the legal basis is to do so

Not receiving any reply, I then contacted the Information Commissioner’s Office, saying

It seems to me that they are processing (including disclosing) large amounts of sensitive personal dataI’m happy to elaborate to ICO if you want, but presume I wouldn’t need to explain exactly why I am concerned.

However, when I received the ICO case worker’s reply, I was rather dumbfounded

You have raised concerns that [redacted] is disclosing large amounts of sensitive personal data on…its website. For information to be personal data it has to relate to a living individual and allow that individual to be identified from the information. I have looked over some of the information…and it appears to be sharing generic data and figures. I could not see any information that identifies any individuals. In order to consider your concerns further it would be extremely helpful if you could provide some examples of where the sensitive personal data can be found and possibly provide a couple of screenshots.

Nonetheless, I replied, giving the two examples above, and the case worker further replied

I have now looked at the examples you have provided and agree that there is the potential for individuals to be identified from the information that [they are] publishing. We will now write to [them] about this matter to obtain some further information about its information rights practices. As this matter does not concern your personal data and relates to third party information we do not intend to write to you again about this matter

I thought the last sentence was a bit odd (nothing prevented them from keeping me informed) but took reassurance that the data would be removed or appropriately anonymised.

But nothing seemed to happen. So I chased the ICO at the end of November. No response. And now I’ve been forced to raise it with the ICO as a complaint:

I understand that you said you would not contact me again about this, but I note that the sensitive personal data is still online. I advise several public sector clients about the online publishing of datasets, with reference to the law and ICO guidance, and the lack of action on this…leaves me quite bemused – do I now advise clients that they are free to publish datasets with such specific and so many identifiers that individuals can be identified? If so, what legal basis do I point to to legitimise the processing?

Public authorities are increasingly being encouraged, as part of the transparency agenda, to make their data publicly available, and to make it available in reusable format, so that it can be subjected to analysis and further use. The ICO has produced generally helpful guidance on successful anonymisation which enables datasets to be removed of personal data. If public authorities fail to follow this guidance, and instead disclose sensitive personal data within those reusable datasets they are potentially exposing individuals to considerable and various risks of harm. Moreover, much of the data in question is gathered pursuant to the public authority’s statutory duties – in other words, data subjects have no ability to opt out, or refuse to give consent to the processing.

One has to ask what this does for the confidence of data subjects in Open Data and the transparency agenda.

I asked the ICO’s always very helpful press office if they wanted to comment, and an ICO spokesperson said: “This is an open case, and we continue to work with the council to explain our concerns about the amount of information being published.” Which raises interesting questions – if they have concerns (and I think I have amply explained here why those concerns are justified) why not take enforcement action to get the data taken down?

The views in this post (and indeed all posts on this blog) are my personal ones, and do not represent the views of any organisation I am involved with.

1 Comment

Filed under Uncategorized

One response to “Sensitive personal data exposed in Open Datasets

  1. Pingback: January 16, 2015 | cybersecurity update

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s