The Information Commissioner’s Office has published its response to the government’s consultation on Copyright and AI. There’s an interesting example in it of a “oh really?!” statement.
The government proposes that, when it comes to text and data-mining (TDM) of datasets that contain copyright works) a broad exception to copyright protection should apply, under which “AI developers would be able to train on material to which they have lawful access, but only to the extent that right holders had not expressly reserved their rights”. Effectively, rights holders would have to opt out of “allowing” their works to be mined.
This is highly controversial, and may be the reason that the Data (Use and Access) Bill has stalled slightly in its passage through Parliament. When the Bill was in the Lords, Baroness Kidron successfully introduced a number of amendments in relation to use of copyright info for training AI models, saying that she feared that the government’s proposals in its consultation “would transfer [rights holders’] hard-earned property from them to another sector without compensation, and with it their possibility of a creative life, or a creative life for the next generation”. Although the government managed to get the Baroness’s amendments removed in Commons’ committee stage, the debate rumbles on.
The ICO’s response to the consultation notes the government’s preferred option of a broad TDM exception, with opt-out, but says that, where personal data is contained in the training data, such an exception would not “in and of itself constitute a determination of the lawful basis for any personal data processing that may be involved under data protection law”. This must be correct: an Article 6(1) UK GDPR lawful basis will still be required. But it goes on to say “the lawfulness of processing would need to be evaluated on a case-by-case basis”. A straightforward reading of this is that for each instance of personal data processing when training a model on a dataset, a developer would have to identify a lawful basis. But this, inevitably, would negate the whole purpose of using machine learning on the data. What I imagine the ICO intended to mean was that a developer should identify a broad, general lawful basis for each dataset. But a) I don’t think that’s what the words used mean, and b) I struggle to reconcile that approach with the fact that a developer is very unlikely to know exactly what personal data is in a training dataset, before undertaking TDM – so how can they properly identify a lawful basis?
I should stress that these are complex and pressing issues. I don’t have answers. But opponents of the consultation will be likely to jump on anything they can.
The views in this post (and indeed most posts on blog) are my personal ones, and do not represent the views of any organisation I am involved with.
