Category Archives: datasets

Machine learning lawful basis on a case-by-case approach – really?

The Information Commissioner’s Office has published its response to the government’s consultation on Copyright and AI. There’s an interesting example in it of a “oh really?!” statement.

The government proposes that, when it comes to text and data-mining (TDM) of datasets that contain copyright works) a broad exception to copyright protection should apply, under which “AI developers would be able to train on material to which they have lawful access, but only to the extent that right holders had not expressly reserved their rights”. Effectively, rights holders would have to opt out of “allowing” their works to be mined.

This is highly controversial, and may be the reason that the Data (Use and Access) Bill has stalled slightly in its passage through Parliament. When the Bill was in the Lords, Baroness Kidron successfully introduced a number of amendments in relation to use of copyright info for training AI models, saying that she feared that the government’s proposals in its consultation “would transfer [rights holders’] hard-earned property from them to another sector without compensation, and with it their possibility of a creative life, or a creative life for the next generation”. Although the government managed to get the Baroness’s amendments removed in Commons’ committee stage, the debate rumbles on.

The ICO’s response to the consultation notes the government’s preferred option of a broad TDM exception, with opt-out, but says that, where personal data is contained in the training data, such an exception would not “in and of itself constitute a determination of the lawful basis for any personal data processing that may be involved under data protection law”. This must be correct: an Article 6(1) UK GDPR lawful basis will still be required. But it goes on to say “the lawfulness of processing would need to be evaluated on a case-by-case basis”. A straightforward reading of this is that for each instance of personal data processing when training a model on a dataset, a developer would have to identify a lawful basis. But this, inevitably, would negate the whole purpose of using machine learning on the data. What I imagine the ICO intended to mean was that a developer should identify a broad, general lawful basis for each dataset. But a) I don’t think that’s what the words used mean, and b) I struggle to reconcile that approach with the fact that a developer is very unlikely to know exactly what personal data is in a training dataset, before undertaking TDM – so how can they properly identify a lawful basis?

I should stress that these are complex and pressing issues. I don’t have answers. But opponents of the consultation will be likely to jump on anything they can.

The views in this post (and indeed most posts on blog) are my personal ones, and do not represent the views of any organisation I am involved with.

1 Comment

Filed under AI, Data Protection, datasets, DUAB, Information Commissioner, Lawful basis, parliament, Uncategorized

Unreasonably accessible – ICO and misapplication of s21?

I’ll start with a simple proposition: if a dataset is made publicly available online by a public authority, but some information on it is withheld – by a deliberate decision – from publication, then the total dataset is not reasonably accessible to someone making an FOI request for information from it.

I doubt that any FOI practitioners or lawyers would disagree.

Well, sit back and let me tell you a story.

In November 2023 the Information Commissioner’s Office (ICO) refused to disclose information in response to a Freedom of Information request, on the grounds that the exemption at section 21 of the Freedom of Information Act 2000 (FOIA) applied: the information was “reasonably accessible to the applicant” without his needing to make a FOIA request.

The request was, in essence, for “a list…of the names of all the UK parish councils that have received 20 or more ICO Decision Notices (for FOIA cases only) since 1st January 2014”. The refusal by the ICO was on the basis that

the search function on the decision notice section of the ICO website returned 415 decision notices falling within the scope of the complainant’s request…[therefore] it is possible to place the names of the parish councils into an Excel sheet and then establish quickly how many decision notices relate to each individual parish council.

The ICO noted that, when it comes to the application of section 21

It is reasonable for a public authority to assume that information is reasonably accessible to the applicant as a member of the general public until it becomes aware of any particular circumstances or evidence to the contrary [emphasis added]

On appeal to the Information Tribunal, the ICO maintained reliance on the exemption, saying that all the applicant needed to do was to go to the ICO website and “look at each entry and count-up [sic] the numbers of [Decision Notices] against each parish council”. The Tribunal agreed: the ICO had provided the requester

with a link to the correct page of the ICO website, and instructing him how to use the search function. These instructions have enabled him to identify from the tens of thousands of published decision notices those 415-420 notices which have been issued to parish councils over the past decade or so

All straightforward, if one’s analysis is predicated on an assumption that the ICO’s public Decision Notice database is a complete record of all decision notices.

But it isn’t.

I made an FOI request of my own to the ICO; for how many Decision Notices do not appear on the database. And the answer is 45. A number of possible reasons are given (such as that sensitive information was involved, or that there was agreement by the parties not to publish). But the point is stark: the Decision Notice database is not a complete record of all Decision Notices issued. And I do not see how it is possible for the ICO to rely on section 21 FOIA in circumstances like those in this case. It is plainly the case that the ICO knew (or was likely reckless in not knowing) that there were “particular circumstances or evidence” which showed that the information could not have been reasonably accessible to the applicant.

Of course, it is quite likely (perhaps inevitable) that the 45 unpublished Decision Notices would make no difference at all to a calculation of how many UK parish councils have received 20 or more Decision Notices since 1st January 2014. But that really isn’t the point. The ICO could have come clean – could have done the search itself and added in the 45 unpublished notices. It knew they existed, but for some reason thought it didn’t matter.

The ICO is the regulator of FOIA, as well as being a public authority itself under FOIA. It has to get these things right. Otherwise, why should any other public authority feel the need to comply?

The views in this post (and indeed most posts on this blog) are my personal ones, and do not represent the views of any organisation I am involved with.

4 Comments

Filed under access to information, datasets, Freedom of Information, Information Commissioner, Information Tribunal, section 21

(Data)setting an example

Is the ICO failing to comply with its own obligations under FOI law?

Some UK regulators are subject to the laws or rules they themselves oversee and enforce. Thus, for example, the Advertising Standards Authority should avoid advertising its services in contravention of its own code of advertising practice, the Environment Agency should avoid using a waste carrier who is not authorised to carry waste, and the Information Commissioner (ICO) – as a public authority under Schedule 1 of the same – should not breach the Freedom of Information Act 2000 (FOIA). However, I think I can point to numerous examples (I estimate there are 57 on its own website at the time of writing this) where the last has done precisely this, possibly unknowingly, or – if knowingly – with no contrition whatsoever.

In 2012 sections 11 and 19 of FOIA were amended by the Protection of Freedoms Act 2012 (POFA). POFA inserted into FOIA what are colloquially known as the “dataset provisions”. For our purposes here, what these say is that

Under its publication scheme a public authority should publish datasets that have been requested [under FOIA], and any updated versions it holds, unless it is satisfied that it is not appropriate to do so.

In short – and I take the wording above from ICO’s own guidance – if someone asks ICO for a dataset under FOIA, ICO must disclose it, put it on its website, and regularly update it (unless it is “not appropriate” to do so).

“Dataset” has a specific, and rather complex, meaning under POFA, and FOIA. However, the ICO’s own guidance nicely summarises the definition:

A dataset is a collection of factual information in electronic form to do with the services and functions of the authority that is neither the product of analysis or interpretation, nor an official statistic and has not been materially altered.

So, raw or basic data in a spreadsheet, relating to an authority’s functions, would constitute a dataset, and, if disclosed under FOIA, would trigger the authority’s general obligation to publish it on its website and regularly update it.

Yet, if one consults the ICO’s own disclosure log (its website page listing FOI responses it has made “that might be of wider public interest”), one sees multiple examples of disclosures of datasets under FOI (in fact, one can even filter the results to separate dataset disclosures from others – which is how I got my figure of 57 mentioned above) yet it appears that none of these has ever been updated, in line with section 19(2A)(a)(ii) of FOIA.

Some of the disclosures on there are of datasets which are indeed of public interest. Examples are: information on how many FOI etc requests ICO itself receives, and how timeously it handles them; information on the numbers and types of databreach reports ICO receives, and from which sectors; information on how many monetary penalties have been paid/recovered.

It’s important to note that these 57 disclosures are only those which ICO has chosen, because they are “of wider public interest”, to publish on its website. There may well be – no doubt are – others.

But if these dataset disclosures are, as declared, of wider public interest, I cannot see that ICO could readily claim that its reason for not updating them is because it is “not appropriate” to do so.

It may be that ICO feels, as some people have suggested, that the changes to FOIA wrought by POFA might not have met any pressing public demand for amended dataset-access provisions, and, therefore, compliance with the law is all a bit pointless. But there would be two problems with this, were it the case. Firstly, ICO is uniquely placed to comment on and lobby for changes to the law – if it thinks the dataset provisions are not worth being law, then why does it not say so? Secondly, as the statutory regulator for FOIA, and a public authority itself subject to FOIA, it is simply not open to it to disregard the law, even were it to think the law was not worth regarding.

The views in this post (and indeed all posts on this blog) are my personal ones, and do not represent the views of any organisation I am involved with.

1 Comment

Filed under access to information, datasets, Freedom of Information, Information Commissioner