The Hackney Gazette reports that details of 15,000 residents have been published on the internet after Hackney Council apparently inadvertently disclosed the data when responding to a Freedom of Information (FOI) request made using the WhatDoTheyKnow site.
This is not the first time that such apparently catastrophic inadvertent disclosures have happened through WhatDoTheyKnow, and, indeed, in 2012 MySociety, who run the site, issued a statement following a similar incident with Islington Council. As that made clear
responses sent via WhatDoTheyKnow are automatically published online without any human intervention – this is the key feature that makes this site both valuable and popular
It is clearly the responsibility of the authorities in question to ensure that no hidden or exempt information is included in FOI disclosures via WhatDoTheyKnow, or indeed, in FOI disclosures in general. A failure to have appropriate organisational and technical safeguards in place can lead to enforcement action by the Information Commissioner’s Office for contraventions of the Data Protection Act 1998 (DPA): Islington ended up with a monetary penalty notice of £70,000 for their incident, which involved 2000 people. Although the number of data subjects involved is not the only factor the ICO will take into account when deciding what action to take, it is certainly a relevant one: 15000 affected individuals is a hell of a lot.
What concerns me is this sort of thing keeps happening. We don’t know the details of this incident yet, but with such large numbers of data subjects involved it seems likely that it will have involved some sort of dataset, and I would not be at all surprised if it involved purportedly masked or hidden data, such as in a pivot table [EDIT – I’m given to understand that this incident involved cached data in MS Excel]. Around the time of the Islington incident the ICO’s Head of Policy Steve Wood published a blog post drawing attention to the risks. A warning also takes the form of a small piece on a generic page about request handling, which says
take care when using pivot tables to anonymise data in a spreadsheet. The spreadsheet will usually still contain the detailed source data, even if this is hidden and not immediately visible at first glance. Consider converting the spreadsheet to a plain text format (such as CSV) if necessary.
This is fine, but does it go far enough? Last year I wrote on the Guardian web site, and called for greater efforts to be made to highlight the issue. I think that what I wrote then still holds
The ICO must work with the government to offer advice direct to chief executives and those reponsible for risk at councils and NHS bodies (and perhaps other bodies, but these two sectors are probably the highest risk ones). So far these disclosure errors do not appear to have led to harm to those individuals whose private information was compromised, but, without further action, I fear it is only a matter of time.
Time will tell whether this Hackney incident results in a finding of DPA contravention, and ICO enforcement, but in the interim I wish the word would get spread around about how to avoid disclosing hidden data in spreadsheets.
The views in this post (and indeed all posts on this blog) are my personal ones, and do not represent the views of any organisation I am involved with.
3 responses to “Hidden data in FOI disclosures”
One of the folk doing sterling work in locating and hiding the data accidentally uploaded to What Do They Know told me that one of the simplest ways to avoid this kind of incident is to look at the size of any attachment being disclosed under FOI. Most of the errant spreadsheets are large (several Mb in size), and that in itself is enough to know that there is a lot more in the document than intended. Definitely something to watch out for.
That blog post by Steve Wood is still available. It’s just been buried in ICO’s recent site redesign:
Ah – thanks Owen, I had a quick search around but couldn’t find it. Will amend accordingly.