This browser is not actively supported anymore. For the best passle experience, we strongly recommend you upgrade your browser.

Perspectives

| 2 minute read

Machine Unlearning – A Potential Option for Remedy?

Machine unlearning (MU) is a concept that is likely more interesting to lawyers than to data scientists, as the latter typically focus on collecting, maintaining, and mining as much data as possible—whether in the context of traditional data analytics or Generative AI (GenAI) model training. Unless the data is outrageously inaccurate or biased—so much so that it presents unfixable statistical distortions and pollutes the generative model to the extent that the entire model needs to be retrained—data scientists are generally not thinking about erasing, deleting, or forgetting data. For lawyers, however, erasing, forgetting, and unlearning hold significance for at least two reasons.

First, privacy laws and regulations include provisions for the right to erasure or deletion (e.g., GDPR, Art. 17, and CCPA §1798.105), which grant individuals the right to request that businesses retract and remove their data from public access or third parties. In the context of GenAI, this could require generative models to “forget” the data they have learned.

Second, one of the fundamental principles of legal remedies is restoring something to its original state, known as “restitution.” For example, in cases of real property trespass, restitution involves vacating the land; in larceny, it requires returning stolen property to its rightful owner. Similarly, in a privacy intrusion case involving generative models trained on private data, restitution might require the removal of that data from the model’s knowledge base—a significantly more challenging task than one might assume.

As explained in this DarkReading.com article, unlearning is far more complex than merely deleting or removing training data: “Under GDPR, deleting personal data is like picking carrots out of a salad. But deleting data from a trained LLM is more like trying to retrieve a whole strawberry from a smoothie.” According to recent research, asking a machine to forget training data is even harder than extracting a strawberry from a smoothie—at least a blender with a strawberry smoothie cannot create new strawberries in its next task, regardless of whether the old strawberry has been removed. A generative model, however, could still produce outputs that resemble the original training data even after the training data’s removal, as knowledge of the deleted data could be revived through user prompts or related tasks. For more on this, see a research paper by leading academics: https://arxiv.org/pdf/2412.06966.

The cleanest method of unlearning for generative models appears to be retraining, but this process is prohibitively expensive and nearly impossible for large language models (LLMs). Scientists are actively exploring more feasible methods to achieve unlearning without requiring full retraining. For recent developments, see an article published in Nature: https://www.nature.com/articles/s42256-025-00985-0.

As computer scientists continue to advance machine unlearning technology, the legal community should closely monitor the development of this technology. MU as a remedy has not yet appeared in legal opinions, but the right to erasure under privacy law and the right to restitution in property law could lead to adoption—particularly if unlearning becomes affordable and practical. 

Under GDPR, deleting personal data is like picking carrots out of a salad. But deleting data from a trained LLM is more like trying to retrieve a whole strawberry from a smoothie.

Tags

intellectual property, artificial intelligence, ai, privacy