In the sidelines of the AI conversation: Model disgorgement and algorithm deletion

30 April 2025

In the sidelines of the AI conversation: Model disgorgement and algorithm deletion

How does model disgorgement and algorithm deletion work? Espie Angelica A. de Leon discusses how artificial intelligence training data may infringe IP or privacy laws and explores emerging remedies.  

Front and centre of the conversation about artificial intelligence are copyright infringement and data misuse. Training data fed into the AI model may include works that are copyright protected or personal information whose very use breaches privacy laws. 

What is not part of the mainstream conversation are the terms “model disgorgement” and “algorithm deletion.” At least not yet. 

Model disgorgement and algorithm deletion are remediation techniques for addressing the problem of AI systems including machine learning models, trained on copyrighted materials without the owner’s consent and personal data protected by privacy laws. 

How do they work? 

These terms are used interchangeably, along with other concepts such as model deletion, algorithmic disgorgement and algorithmic destruction. “Context is important to decide which term is most accurate,” noted Christopher J. Rourk, a partner at Jackson Walker in Dallas. 

The word “model” obviously refers to the AI model or machine learning model. But what is an algorithm, exactly? Rourk explained: “An algorithm is usually considered to be more self-contained than a model, like lines of functional computer code, whereas a model might include ‘weights’ or other data that are part of the algorithms in the model at each node of a plurality of nodes – say, in a neural network – where the weights are adjusted at multiple nodes in response to training data.” 

Model disgorgement and algorithm deletion work by extracting or deleting illegitimate or unauthorized data from the AI system, making it appear that such data was never used to train the model in the first place. Also part of the mechanism are the destruction of the AI model or the algorithms in it and the destruction of all products developed using the data that was taken out. 

In the U.S., the Federal Trade Commission (FTC) has begun ordering model disgorgement in certain cases.  

In 2019, the agency ordered Cambridge Analytica to destroy its algorithms in a data breach scandal that made headlines around the world. The London-based data analytics firm was found to have collected information about millions of Facebook users via the personality profiling app called This Is Your Digital Life for political purposes. The majority of the users did not provide their consent to the use of their personal data. Cambridge Analytica ceased operations on May 1, 2018, in the midst of the scandal. 

In March 2022, the FTC ordered WW International, formally Weight Watchers International, to delete personal information it collected from children below 13 without their parents’ consent. The agency also ordered the destruction of models or algorithms created using the children’s data.  

In May 2023, it ordered edutech platform Edmodo to delete models or algorithms that stemmed from data it collected from kids sans their parents’ permission. The California-based company harvested and used the data for advertising purposes.  

These are just a few examples. 

According to Dalvin Chien, a partner at Mills Oakley in Sydney, model disgorgement and algorithm deletion were more practical during the early days of AI when models were simpler. The models could be easily rebuilt, and the offending data could be easily identified and removed. Simple AI systems also meant lower upfront costs, making it less difficult to start from scratch when circumstances warrant it.  

However, AI models have evolved. They now have stronger capabilities and processing techniques, which means AI companies have been investing more resources and time into building them. Given these, remediation techniques have to keep pace. 

Here are some modern model disgorgement techniques which are also AI-enabled: 

Retraining. This technique involves removing the data and then retraining the AI model. However, the volume of information that has been used to train the model is constantly increasing, making this technique increasingly difficult. “Nonetheless, it could still be viable if a faster way of identifying and removing offending data, such as by another AI, from a dataset is developed,” said Malcolm Liu, a senior associate at Mills Oakley in Sydney. 

Unlearning. The AI model is taught to remove the effects of the offending data. Thus, it eliminates manual searching and actual removal of the data. ”For example, if a particular outcome is derived or a particular effect is applied as a result of the AI model using the offending data, the AI model disregards or gives less consideration to that outcome or effect,” said Chien. Unlearning also allows the AI model to continue its development. 

Compartmentalization. “This is where instead of having a single AI model that is trained on a single but large dataset, multiple smaller AI models trained on smaller datasets are used instead, and their combined or averaged outputs are displayed. This way, if the offending data is located in the datasets of only one or a few of those smaller AI models, those AI models can be removed without substantially impacting the performance of the overall AI system,” Liu explained. 

According to Rourk, it is difficult to detect whether copyright-protected work or data protected by privacy laws were used to train an AI model. He added it might even be an open legal question if copyright infringement or a violation of data privacy has occurred if there is no way for the training data to be output. A case in point is generic object recognition in image data.  

Despite these, Rourk said enforcement of model disgorgement and algorithm deletion orders is easier for generative AI. “So, for AI that does something like recognize people, it could be difficult to tell if it used privacy-protected information of a person for training, such as when there is another source of information for that person that is not private,” he said. 

“For example, let’s consider the iconic photo of Albert Einstein sticking out his tongue, which was not copyrighted or private, but pretend that it was. If that photo was used to train an AI model for scanning faces for entry authorization, it would be very difficult to determine whether it was used, even if Einstein stood in front of the camera and stuck out his tongue because there are many other photos of Einstein that could also have been used to train that AI for that function. On the other hand, if that photo was used to train generative AI and if that generative AI generated that image of Einstein with sunglasses or a funny hat, then it would be easier to identify that the ‘copyrighted’ image was used.” 

Rourk mentioned an Andy Warhol painting as another example. He said: “For example, a neural network that processes image data like Tensorflow can be trained to identify objects in image data, like a soup can. If an Andy Warhol painting of a soup can was used to train a Tensorflow model for identifying objects in image data, that use might be difficult to detect because the training data would only be used to identify soup cans, and not to output images that might be similar to the Andy Warhol soup can. Even if the Tensorflow model was trained to identify the specific Andy Warhol painting, it is arguable whether any copyright infringement may have occurred, because the test for copyright infringement is whether there is substantial similarity, between the copyrighted work and the potentially infringing work. 

“However, for generative AI, the output could include image data with some of the recognizable elements from the Andy Warhol painting and might be considered substantially similar to the copyrighted Andy Warhol painting,” Rourk said. 

It’s the same with data privacy. Whether data protected by privacy laws was used to train the AI system would be difficult to determine if it isn’t generative AI. 

For Chien and Liu, prevention is always better than cure. The more money and time spent on developing an AI model, the harder it becomes to destroy and rebuild it. Thus, it is a lot easier and more cost-effective to prevent the AI model from using or producing illegally obtained data in the first place.  

“To this end, laws and regulations addressing AI, including their development and use, are being developed, updated and implemented so that suppliers of AI products are held to minimum compliance standards that ensure privacy rights and copyright are protected and that AI models are developed and used in a manner that is consistent with privacy and copyright principles,” said Chien. 

“The European Union Artificial Intelligence Act takes us some way there. Even in the U.S., where several cases enforcing AI suppliers to use model disgorgement on their AI products have already occurred, these enforcements have been made indirectly through general prohibitions on ‘unfair or deceptive acts or practices,’ as opposed to breaches of AI-specific laws,” Liu added. 

In China, several key administrative regulations relate to AI and machine learning models, model disgorgement and algorithm deletion. 

One of these is the Provisions on Deep Synthesis Internet Information Services. Article 14 states that when illegal content is discovered, service providers must take immediate measures to stop generating such content, conduct model optimization training for correction and report to relevant authorities 

Another is the Administrative Provisions on Generative Artificial Intelligence Services. Article 7 requires the use of legally sourced data and base models, IP rights protection and personal consent when handling personal information. 

“Based on China’s regulatory framework, several solutions are implemented. There are preventive measures, technical controls and an ethics review system,” said Ji Liu, director for patent litigation at CCPIT Patent & Trademark Law Office in Beijing. 

Preventive measures are embodied in Article 8 of the Administrative Provisions on Generative Artificial Intelligence Services, which requires clear data annotation rules and quality assessment, and Article 10, which requires technical measures to audit input data and synthetic results. 

Provisions on Deep Synthesis Internet Information Services provide for technical controls. Articles 16-17 mandate content marking systems and require preserving relevant network logs. 

Meanwhile, the Administrative Measures for Scientific and Technological Ethics Review (Trial) establishes ethics review committees at institutions and regular tracking reviews of high-risk activities. 

“As AI products become increasingly mainstream in the global economy today and in the near future, and as different models are developed to address the use of unauthorized or illegal data, there is no doubt that laws and regulations governing the AI space will develop and be enacted rapidly,” Chien said. “We recommend companies that supply or deploy AI products in their service or product offerings be on the front foot with respect to AI, privacy and cybersecurity compliance to ensure they are not penalized or have to undertake substantial remediation measures when these laws and regulations come into play.” 

Model disgorgement and algorithm deletion – along with algorithmic disgorgement, algorithmic destruction and model deletion for that matter – may be on the sidelines of the AI conversation. Yet, these remediation techniques cannot be set aside completely. AI developers and suppliers should learn about them and know what these mechanisms entail. After all, the world is well into the race for AI development and adoption. Copyright infringement and data misappropriation have become part of the race. 


Law firms

Please wait while the page is loading...

loader