Key legal issues connected with the way OpenAi’s ChatGPT processes information and meets the requirements of EU privacy legislation remain unresolved a year after a European Data Protection Board (EDPB) taskforce began to examine them. A report outlining the taskforce’s preliminary conclusions illustrates the complexity of regulating emerging large language model (LLM) technology and the regulatory risk facing the companies developing it.
The report comes against a backdrop of complaints and investigations into whether ChatGPT breaches EU privacy law. In February, Italy’s Data Protection Authority said that it suspected ChatGPT to be breaching GDPR Articles 5, 6, 8, 13 and 25. There is also an ongoing investigation in Poland over alleged breaches of privacy law, and another in Austria.
Personal data
The taskforce report points out that “each processing of personal data must meet at least one of the conditions specified in Article 6(1) and, where applicable, the additional requirements laid out in Article 9(2) GDPR.” And it says that assessment of whether the processing of personal data is lawful can usefully be done by splitting the operation into stages. These are:
- collection of training data;
- pre-processing data;
- training;
- prompts and ChatGPT output; and
- training ChatGPT with prompts.
Put simply, under GDPR any entity that wants to process data about people must have a legal basis to do so. Italy’s DPA has told OpenAI that it cannot simply rely on performance of a contractual obligation to claim a legal basis for processing personal data. That means it must either gain explicit consent from users to use their data, or meet a test of legitimate interest (LI).
Reliance on LI means OpenAi would need to demonstrate it has a need to process the data, that the processing that it carries out is necessary to meet that need, and that it has carried out a balancing test that weights its own legitimate interest with the rights and freedoms of the individuals whose data it is processing. In all of this, says the report, “The reasonable expectations of data subjects should be taken into account.” And, it adds, “measures should be in place to delete or anonymise personal data that has been collected via web scraping before the training stage.”
Where input to, output from and training of ChatGPT is concerned, the report expresses the view that, even though OpenAI provides the option to opt out of the use of content (data), “Data subjects should, in any case, be clearly and demonstrably informed that such ‘Content’ may be used for training purposes.”
Hallucination
The section on fairness is particularly interesting as it tackles the challenges posed by so-called “hallucination” – where LLMs generate factually inaccurate responses. It opens by saying that “the principle of fairness pursuant to Article 5(1)(a) GDPR is an overarching principle which requires that personal data should not be processed in a way that is unjustifiably detrimental, unlawfully discriminatory, unexpected or misleading to the data subject.” This means, it goes on, that “controllers should not transfer the risks of the enterprise to data subjects.”
The key observation here is that “if ChatGPT is made available to the public, it should be assumed that individuals will sooner or later input personal data. If those inputs then become part of the data model and, for example, are shared with anyone asking a specific question, OpenAI remains responsible for complying with the GDPR and should not argue that the input of certain personal data was prohibited in first place.”
The taskforce acknowledges that OpenAI has set out how it intends to address these issues, but emphasises that it has yet to examine those proposals.
Rights of data subjects
One eye-catching section relates to what the taskforce terms “probabilistic output creation mechanisms”. It says the controllers of LLMs must, in the information they provide to users, include “explicit reference to the fact that the generated text, although syntactically correct, may be biased or made up.”
The report also says that “it is imperative that data subjects can exercise their rights in an easily accessible manner”, and notes that “OpenAI suggests users to shift from rectification to erasure when rectification is not feasible due to the technical complexity of ChatGPT.” This seems to imply – and the cases in Poland and Austria seem to back this up – that users can’t get incorrect personal information about them corrected, but can prevent it being further generated.
Reports on outstanding cases indicate that various national data protection authorities were waiting for the taskforce to complete its work and make some recommendations. These are just preliminary conclusions, but the line that says controllers should “implement appropriate measures designed to implement data protection principles in an effective manner” hints at the fact that the taskforce is not particularly close to providing the advice being waited for.